I saw the repeating 'A' at the end of the base64 text and thought "it's not even 512 bytes; it's smaller!"
That said, the title is just a little clickbaity --- it's a C-subset compiler, and more accurately a JIT interpreter. There also appears to be no attempt at operator precedence. Nonetheless, it's still an impressive technical achievement and shows the value of questioning common assumptions.
Finally, I feel tempted to offer a small size optimisation:
sub ax,2
is 3 bytes whereas
dec ax
dec ax
is 2 bytes.
You may be able to use single-byte xchg's with ax instead of movs, and the other thing which helps code density a lot in 16-bit code is to take advantage of the addressing modes and LEA to do 3-operand add immediates where possible.
Good tip! Yeah, there’s ~20 bytes unused at the end. I kept finding ways to squeeze out a few more and had to tell myself to stop and just publish it already. You could take this further if you really wanted. But it’s already sufficiently absurd.
The book that had the Cain/Hendrix "Small C" compiler, runtime library, assembler and tools was a fantastic resource that taught me C and compiler construction at the same time.
In general, reading (lots of) source code is a good way to learn how to do things in a new language, i.e. to move from the lexical and syntactical level to the level of idioms for problem-solving. On reflection, I find it strange that in programming teaching, larger pieces of existing well-written source code are never discussed/explained/critiqued.
Small C was the inspiration for SubC, a project of mine. It is pretty much a cleaned-up and slightly extended rewrite of Small C that compiles with few warnings on modern compilers. Also comes with a book.
An interpreter with a JIT compiler is able to do more optimizations because it has the runtime context to make decisions, while a AOT (ahead of time) compiler will not know anything about what happens at runtime.
This is why some JIT'd languages (like Javascript) can be sometimes faster than C.
Can you give some simple example for the folks in the back of how JIT'd languages can be faster than C? I think most people are under the impression that statically compiled languages are "always faster."
> Can you give some simple example for the folks in the back of how JIT'd languages can be faster than C?
If the JIT has instrumentation that analyzes execution traces, then it can notice that a call through a function pointer always goes to the same function. It can then recompile that code to use a static function call instead, which is considerably faster.
Basically, it can perform a similar set of optimizations to a static compiler + profiling information in basic cases. In more advanced scenarios, it specializes the program based on the runtime input, which profiling can't do for all possible inputs, eg. say the above function pointer call only happens for input B but not for input C.
In theory, some execution sequences are not knowable except at runtime which could be optimized after the code has already been running for a while.
In practice, static AOT compilation is essentially always faster for a couple reasons. The various types of overhead associated with supporting dynamic re-compilation usually aren't offset by the gains. Re-compiling code at runtime is expensive, so it is virtually always done at a lower optimization level than AOT compilation to minimize side-effects. CPU silicon is also quite good at efficiently detecting and optimizing execution of many of these cases in static AOT code. You can also do static optimization based on profiling runtime execution, which is almost (but not quite) the same thing with more steps.
Honestly it always depends on what "faster" means for you. For one crowd faster means "fast number crunching" (e.g. anything AI these days). There statically compiled code reigns supreme because it is mostly about how fast your very specialized code (e.g. matrix multiplications) runs and it does not hurt if you just ship a specialized, statically compiled version for all possible targets. (iirc GCC does something like that when building generic code that will utilize different code sets (SSE,AVX,etc) when they are available at runtime.
For another crowd "fast" means that the code they haphazardly thrown together in an interpreted language runs fast enough that nobody is negatively affected (which is a completely valid usecase, not judging here).
And to answer your question for examples:
An interpreter with JIT compiler might for example notice that you have a for loop that always gets run with the same number of arguments, unroll the loop and at the same time vectorize the instructions for an immediate 4x gain in execution speed.
Otoh Javas Hotspot JIT compiler tracked how often code was called and once a "hotspot" was identified compiled that part of the program.
Last example: if you are using an interpreted language (say Python) every roundtrip through "python-land" costs you ... A simple for loop that just runs a simple instruction (say: acc = 0; for x in xs: acc += x) will be orders of magnitudes slower that calling a dedicated function (numly.sum(xs)), JITing that code (e.g. with numba) will remove the roundtrip through python and achieve similar speeds.
This is all in theory. Everyone says this like it's already here but it's really not (in the sense that these fast jits are still mostly worse than well-written C).
But what it is is mostly choosing optimizations based on runtime characteristics. It's a dynamic analogue to profile-guided optimization. Like you might have an optimization that trades code size for CPU cycles, which you could choose not to do at runtime if you're low on memory bandwidth instead of CPU time. Stuff like that.
That said, the title is just a little clickbaity --- it's a C-subset compiler, and more accurately a JIT interpreter. There also appears to be no attempt at operator precedence. Nonetheless, it's still an impressive technical achievement and shows the value of questioning common assumptions.
Finally, I feel tempted to offer a small size optimisation:
is 3 bytes whereas is 2 bytes.You may be able to use single-byte xchg's with ax instead of movs, and the other thing which helps code density a lot in 16-bit code is to take advantage of the addressing modes and LEA to do 3-operand add immediates where possible.