Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Intel X86 Encoder Decoder (intelxed.github.io)
146 points by luu on Dec 17, 2016 | hide | past | favorite | 40 comments


I wonder if this was released due to Xen's recent x86 instruction emulation bugs.

http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-9932

http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-9383


Probably not (but who knows, you could probably use this as part of a fuzzer). Instruction emulation is a superset of instruction decoding. You need to decode and then emulate the behavior.

Aside: Instruction emulation is pretty finicky and bug prone. I'm not too familiar with Xen, but KVM has had at least 10 instruction emulation CVEs. There were talks at both KVM Forum and Xen Summit last summer mentioning the sketchiness of instruction emulation.


Why would the release of this library have anything to do with Xen vulnerabilities?


Those CVEs are private. Could you provide a public source?



Seems like this would make one able to write the object-code codegen phase of a compiler at a rather higher level (at least if you're only targeting x86, or are willing to write similar libraries for your other arch targets.)

Or, to put it another way: looks like a good library for adding "just a bit of JIT" to an interpreter, without going full LLVM.


"As new instructions are made public"? WTF does that mean? We're not even allowed to know the full instruction set of an X86 CPU?


Not allowed yet.

Ex-Intel here. It takes years to gestate new instructions. First specs are in controlled documents available to Intel employees only. Eventually, when things are nailed down, preliminary specs are available under an NDA where a VP approves/signs the Intel side of the NDA. Tool vendors will get those specs. Finally, when the chip is announced the previously NDA documents become public. I was a CPU designer at multiple companies, and everyone follows a similar process.


What about the infamous i386 LOADALL instruction? Certainly there is at least one that is kept secret.

http://www.rcollins.org/articles/loadall/tspec_a3_doc.html


You're just talking about the design phase of a new ISA extension. I think the parent was under the impression that there were instructions in released processors that were undocumented.


And that impression would be correct.

Furthermore, if you are a big enough customer you can also negotiate for special isa extensions yourself (can't provide citation but heard from people at Intel), then of course you'd be the only one to have the documentation to take advantage of them.


There is at least one Intel processor with a public erratum to an instruction (or other mechanism?) that is not public. It's a timed MWAIT, I think.


That was exactly my impression. I don't have a problem with new instructions in new processors being undocumented at first, but purposely undocumented instructions are troubling and would push me away from Intel for new designs.


So, you are the decision maker for whether or not Intel CPU's get designed into your product? If so, I'm sure several Intel salespeople have you on speed dial.

The 286 had purposely undocumented instructions, of this sort: "Ooops this is b0rk3d. It will always be b0rk3d. Let's pretend it didn't happen." So for generations there were holes in the op code map that people tip-toed around. Especially since Intel (meaning the internal grey-beard collective) also forgot exactly what those opcodes were and what they were supposed to do. You care, why, exactly?

It's not like the NSA slips extra opcodes into executables that you compile with your own compiler in order to spy on you. They have much easier ways to spy on you.

Also, it's not like it is that hard to throw unused opcodes at the decoder and see which ones give you the illegal instruction exception, and which ones do something else. You now have a homework assignment. Have fun, let us know what you find.


There are.


The story of FXSAVE/SSE (I wonder if you were there for it):

http://web.archive.org/web/20000817193452/http://www.tbcnet....

http://web.archive.org/web/20000817084210/http://www.tbcnet....

Obviously, XSAVE/AVX was handled better.


> previously NDA documents become public

I'd say some are eventually made public. I went to the IDF in San Francisco expecting this sort of information on Skylake. What a waste of a day.

Not that I'm complaining. Intel's information is excellent. It just arrives when it arrives and it is what it is at that point in time. There are many ways involving effort which glean more information: Agner Fog, articles, patents, .... BTW, Intel folks are helpful on their dev board.


XED author here...

@dbcurtis: Yes, that's what I meant. Thanks for clarifying.


Read this as "as upcoming instructions in new chips are documented"


Yes, that's likely what the author meant. But, also, the author might have access to development specs or early samples not publicly announced yet.


There is a many thousands pages instructions manual for that, that is freely downloadable from Intel. It's re-posted on HN from time to time.


Speaking of which... I'd really love to have a reference manual on one of the early x86 processors. Maybe the 386, since it is one of the iconic ones (or the one I have the most nostalgia for) that also started to get interesting features like protected mode, but still wasn't so complicated as to be a multi-volume, 5,000-page series (maybe?).

Back in the 1990s, when I was doing assembly language on the Amiga, Motorola sent me the official 68000 manual for free when I called and asked them. It was a really cool book to have on the shelf and occasionally lead through.

I looked at abebooks.com, and there are Intel books, but I wouldn't know exactly which one would be the reference manual. Anyone got an ISBN?


For the 80386 there are two manuals, a Hardware Reference Manual (ISBN 1-55512-069-5) and a Programmers Reference Manual (ISBN 1-55512-022-9). They come up on eBay from time-to-time, I think I paid $30 for my pair.


That's exactly what I was looking for, thanks. Found them on abebooks.com easily. Awesome 1980s neon covers.


Intel used to give away hardcopy manuals for free too, but stopped doing that around 2010:

http://styx.head-crash.de/stuff/intel_manuals.jpg

Now they still have hardcopy, although it's basically just paying for someone else to print and bind the PDFs for you:

http://www.lulu.com/shop/search.ep?contributorId=1030088


Try Bitsavers (an amazing work in the service of historic preservation): http://www.bitsavers.org/

Previously discussed here: https://news.ycombinator.com/item?id=10143295

Internet Archive collection (the Intel documents part): https://archive.org/details/bitsavers_intel

Library of Congress reference: https://www.loc.gov/item/lcwa00096459/


Don't expect any management engine/system management mode instructions any time soon/ever.


No reason to hide them; presumably, as with firmware updates, you have to load an Intel signing key into a few registers along with your IME/SMM opcode to get it to execute.


Huh? SMM is fully documented at the CPU level. QEMU can even emulate it. Not sure what you mean by "management mode".


Sorry, I meant the undocumented MEI and SMM features.


This is (partially) meant to be used for chips that don't exist. As future instructions are made public software vendors (like compiler vendors) can use this to be ready when the chips come out day 1.


This is awesome. I thought it's closed source. I wonder how close it is to the hardware.

Can I assume that whatever is parsed by XED is going to be parsed in the same way by real CPU's?


XED author here...

That's one goal certainly. Not claiming it is perfect. I have been working on it a long time and many cycles have been run on it between Pin, Intel SDE and some internal simulators that boot OSes, etc.


How does this compare to ARM's VIXL?

[1] https://github.com/armvixl/vixl


About time they open sourced it. Funny how it is Python according to github :)


Python is used to generate C code from tables. Also use python to build/package/test.


Why is that funny? Python seems like the perfect language for this kind of thing. You can see the logic with minimal noise.


It's funny since it's not very accurate: the Python is used to generate C code from tables describing instruction encodings (that's my understanding, haven't cloned this and built it). But there is plenty of actual C code in there too, and the resulting "thing" is a C library.

Python would not be as suitable as the full implementation language, since C is much more easily accessed from other languages, which is a good property for a library like this.


Looks handy. Thanks for posting, Dan.


Something that is universal is not remarkable. People will not make as much use of an unremarkable opportunity as they will of a remarkable one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: