Hacker Newsnew | past | comments | ask | show | jobs | submit | fm77's commentslogin

Thank you very much for that super analysis done with the help of AI - I really enjoyed reading that. May I ask are you paying for that service? And if so, how much?

Anyhow, I downloaded your ZIP file and looked into the disassembly. It seems that the disassembler simply disassembled byte by byte not taking into account that TURBO.COM is both, code and data. Since the x86 instruction set is very tense, pretty much every byte sequence turns into legal instructions. Even the ASCII strings were disassembled. Look at address hex4864 in the file for example - it should be the string "Write block to file" but got disassembled. I wonder how AI managed that obscure file.


I ran the analysis using regular Claude. I'm paying $200/month by the $20/month subscription should work fine too, and it might even work with the free plan.

Turbo Pascal Version 6 source code is online and easy to find. It is not the original source code written by Anders Hejlsberg but a fantastic reverse engineering job. Turbo Pascal 7 added just a little bit to the compiler (like "break", "continue", "inhertied") and added an optimize step to the code generator (eliminate multible "les di,…" for constant pointers). But in order to study the compiler version 6 should do it.


Good to know - thanks!


I was quite late to the previous „Happy birthday Turbo Pascal" entry so I repost, @admin: I hope thats ok.

For all who wonder why Turbo Pascal was so fast here some insights: https://news.ycombinator.com/item?id=38477688#38486185


The link is about the assembly details of the compiler code. But that still misses the point in a few important ways. And I'm not talking about the Turbo Pascals with the numbers 5 or higher, but about the first ones actually used, like the one for CP/M where the OS and the program all have to fit in 64 K. Turbo Pascal was more than a compiler fitting in that space: it contained editor, kept the source of the compiled program completely in RAM, and produced executable with one write to disk and just a few reads.

https://techtinkering.com/2013/03/05/turbo-pascal-a-great-ch...

1) The fact that the author was very used to think in assembly is that he was able to generate a compiler to the native code that even without optimizing passes already "well behaves": not squeezing every drop possible for optimization but also making a lot of optimization passes not necessary. That's the "magic sauce" in being what was called "a single pass."

2) Description "a single pass" is actually somewhat imprecise when used by most who talk about it. The passes through something are necessary in compilers, the actual goal is making once the processing on the levels which hurts the whole compilation process the most.

3) So the major difference between the typical "production" compilers and Turbo Pascal was in later being able to fit "everything" in the RAM between the last edit and the production of the compiled code. If I remember correctly it didn't even have to write anything to the disk if one just needed to run the program(!) Only when one needed the executable the write happened.

So assembly was an important part of the story but at the time most of the professionals used assembly anyway, the speed vas result every other decision was also made to achieve minimal I/O (which also wasn't necessarily unique feature) while also being useful enough for the real programs (that was less unique) and comfortable (Wordstar-keys editor, jump to the error after compilation).

A lot of very good decisions combined, that was special. And finally, comparably cheap, too.


For all who wonder why Turbo Pascal was so fast here some insights:

50% is certainly due to the language Pascal itself. Niklaus Wirth designed the language in a way so it can be compiled in a single pass. In general the design of Pascal is in my opinion truly elegant and compared to other programming languages completely underrated. Wirth published a tiny version of his compiler written in Pascal itself in a 1976 book called "Algorithms + Data Structures = Programs".

In the late 70s Anders Hejlsberg took that version and translated it into assembly. He certainly must have changed the codegenerator since Wirth's version emitted bytecode for a tiny VM whereas Anders version produced machinecode, however if you take a closer look especially at the scanner and parser of Turbo Pascal and Wirth's version you can see that they are very similar. Back then Anders was not so much a language guy in my opinion but much more an assembly genius. And that resulted in the other 50% of why Turbo Pascal was so fast:

-) The entire compiler (scanner/parser/typechecker/codegenerator/ and later the linker) was written in assembly.

-) The state of the compiler was held as much as possible in cpu registers. If e.g. the parser needed a new token from the tokenstream, all registers were pushed to the stack and the scanner took over. After the scanner fetched the next token, registers where restored.

-) The choice of which register hold what was also very well thought through. Of course the cpu dictates that to a certain extent but still lots of elegant usage of the "si"/"di" register in combination of non repetitive lodsb/stosb instructions were done.

-) The entire "expression logic" (expression parsing / expression state / code generation for expressions) was kinda object oriented (yes, in assembly) with the "di" register hardwired as the "this" pointer. If the compiler needed to handle two expressions (left expression and right expression), then one was held in the "di" register and the other one in the "si" register. Since the "di" register was hardcoded, you will find lots of "xchg di,si" in the codebase before a "method" (a procedure with the "di" register as a "this" pointer) will be called.

-) Clearly the cpu registers were not enough in order to hold the entire state of the compiler so heavy use of global variables were made. Global variables have the advantage of not needing a register in order to access them (e.g. "inc word ptr [$1234]").

-) Parameter passing was done through registers and were possible stack frames were avoided (too expensive), meaning no local variables (still heavy usage of push/pop within a procedure, does this count as a local?)

-) Parameter passing done through registers allowed procedure chaining: instead of "call someOtherProc; retn" at the end of a procedure just "jmp someOtherProc" was used to a great extent.

-) Jump tables everywhere. In general the compiler was quite table driven.

-) Avoiding of strings as much as possible and if needed (parsing identifiers / using filenames) then avoiding to copy the strings around as much as possible, meaning all strings were held in global variables. The big exception here was of course the copying of the identifiers from the global variable into the symbol table.

-) Starting with Turbo Pascal 4.0, hash tables were used as symbol tables. Arena allocator for memory management.

I am sure I forgot a lot, I reverse engineered Turbo Pascal back in the late 90s. Most of the above applies to Turbo Pascal 7.0, but lots have not changed in earlier versions over time.

It is a shame that such a wonderful codebase is buried under the "closed source, proprietary software" label. It is clear that today nobody would write a compiler the way Turbo Pascal was written, not even in a high level language but the codebase has some many tricks, so many elegant solutions, that it is a real pity that this is not open source. Of course the codebase is on the web, just not the official one.

Thank you Anders Hejlsberg for such a wonderful piece of software.


All of this is fascinating. I believe single-pass compilation is underrated, and quickly disregarded by a large body of the PL community as anachronistic. I think that's complete nonsense. Just take one look at the massive build infrastructure that's driving modern monorepos to see how incredibly crucial fast compile-times are.

> Jump tables everywhere. In general the compiler was quite table driven.

It would be interesting to see how this approach fares in the face of modern branch prediction on modern CPUs.


Single-pass is a bit of a gimmick. It requires programs to be written sequentially in a strictly "bottom up" way, so that forward references to parts of the program that have yet to be defined are rare enough that they can be marked specially (e.g. as with C program headers).

It's also largely irrelevant if you want optimized code generation, especially across multiple procedures, since that requires you to read abstract representations of the code into memory and deal with them globally which is the opposite of single-pass compilation.


How often do you need highly optimized cg during development? Unless I'm working on games, 99.9% of my time (even on highly performance-critical software) is spent on evaluating debug builds with 0 perf requirements - because I need to implement it correctly first, and make sure tests are passing.

I think it's uncontroversial that most fast, statically compiled languages benefit greatly from quick debug builds. It's just that very few of them are designed with this in mind.


I think I have made the argument above that the feasibility of single-pass code gen boils down to how the program is structured, not so much the language design itself. Perhaps current compilers should be reworked to generate code about as quickly as TP did, if optimizations are totally disabled and the code is written to eschew unresolved forward references. But I'm not sure there would be much of a point. And you would still have many language features where going back and performing a second pass over what was previously parsed (and perhaps codegen'd) just can't be avoided.


Delphi is single pass today... this has drawbacks, but the compiler is fast!


> meaning no local variables (still heavy usage of push/pop within a procedure, does this count as a local?)

Isn't it (push/pop) how locals basically work?


I have read that quite often here and there and while the management at Borland certainly was a pain, I don’t think that for Anders Hejlsberg this was the main reason to leave.

I believe (and I have no prove) that during development of Delphi V1.0 Anders realized that his „baby", the source code of the compiler core written in 16-bit assembly became worthless. The new 32-bit compiler for Delphi V2.0 was written in C by Peter Sollich.

Anders new role at Borland was just to be an architect / manager / teamleader / whatever… He could have contributed to the new 32-bit compiler in a way he does today to TypeScript, but for whatever reason he didn’t wanted to. That, along with the Borland management and the nice $$$ signing bonus offered at Microsoft made him leave. Again, that's just my take.

Peter Sollich left Borland I think in 1998 in order to join Microsoft, to this day he is in the C#/.net team. https://youtu.be/LPcjSdob9AA

And I still wonder who has written Turbo Pascal for Mac in 1985/86…


It is the official reason told by himself in this interview,

"Anders Hejlsberg: A craftsman of computer language"

https://behindthetech.libsynpro.com/001-anders-hejlsberg-a-c...

Now if you want to tell he makes that up, I dunno.


Slight correction from an Turbo Pascal expert ;-)

Turbo Pascal did NOT have an 8-byte fixed point "currency" type suitable for monetary calculations. It did have an int64 type called "comp" though which was handled as a float by the FPU and hence not slower than the types "single" (f32), "double" (f64) or "extended" (f80).

IIRC, "currency" came with Delphi V2.0 (or even later), but then still it wasn't slower than other floats when you did heavy calculations with it as it was also handled by the FPU. Only reads and writes from and to such variables were expensive as there was always a scaling (multiplication by 1e4 and 1e-4) involved — internally it was that int64 "comp" type. (But here I might be wrong, I never really used "currency", I disassembled lots of Delphi binaries with lots of different data types as I wanted to know how the compiler worked. Today however my Delphi knowledge is quite fuzzy).


In Turbo Pascal :-) you have to filter for NaN or else you will end up with a Runtime Error.

  const NaN = $FF shl 23;

  var x, t: longint;
      f: single absolute x;

  begin 
    t := 0;
    x := 0;
    repeat
      inc(t, ord((x and NaN <> NaN) and (0<=f) and (f<=1)));
      inc(x);
    until x = 0;
    WriteLn('total: ', t);
  end.
total: 1065353218 (in ca. 40 seconds)


cproc: a C11 compiler

  *) https://github.com/michaelforney/cproc
  *) programming language used for the compiler: C
  *) using QBE as a backend - https://c9x.me/compile/


chibicc: A Small C Compiler

  *) https://github.com/rui314/chibicc
  *) programming language used for the compiler: C
  *) emits assembly text (x86-64)


> In standard Pascal, functions are closures but not first-class values (can be passed as arguments to other functions, but cannot be returned or assigned to variables)

Interesting - being an absolute non-expert in closures - but isn't it the other way around? That is, in Pascal, functions are first-class values but not closures?

Unless I am miss something here, but functions can be assigned to variables:

  function add(a,b:integer):integer; { far; in Turbo Pascal }
  begin
    add := a + b;
  end;

  type MyFunc = function (a,b:integer):integer;

  var f:MyFunc;
      x:integer;

  begin
    f := add; { here we assigning a function to a variable }
    x := f(3,4);
  end;
> Some dialects - notably, Turbo Pascal - don't allow pointers to local functions at all.

I know what you mean here, but that's not entirely true. In Turbo Pascal you had two ways to receive a pointer to a function:

1.) the way just shown, and here you are right, Turbo Pascal doesn't allow local functions, but that’s because how local functions in Turbo Pascal were implemented. Local functions have a hidden parameter (a 16-bit pointer to the caller's stackframe) pushed onto the stack, how do you handle this hidden parameter when you deal with a function pointer with a defined function signature? To avoid headaches I guess Hejlsberg simply choose to not allow local functions here.

2.) via a regular pointer - you simply store the function address in a pointer variable (or pass the function address to a pointer parameter), so:

  procedure MyProc;

  { a local function }
  function sub(a,b:integer):integer; { far; in TP }
  begin
    sub := a - b;
  end;

  type MyFuncHidden = function (a,b:integer; hidden:word):integer;

  var p:pointer;
      x:integer;

  begin
    p := @sub; { assigning a local function to a variable }
    x := MyFuncHidden(p)(7,8,0);
  end;
The problem here, now you as a programmer are in charge of handling the parameter passing.

Look at the implementation of ForEach/FirstThat in the Turbo Vision library. Both ForEach and FirstThat accept local functions as parameter (and only local functions).


Function pointer types are a Turbo Pascal extension; in standard (Wirth/ISO) Pascal, you cannot have "type = function" etc, you can only use that syntax to declare arguments of function type. So they're not first-class values. They're closures because they capture the enclosing environment (variables).

In your second example, what you're doing is the equivalent of reinterpret_cast between two incompatible pointer types (if they were compatible, you could have just assigned @sub to a variable of type MyFuncHidden). You cannot safely assume that your new signature faithfully corresponds to what the compiler does under the hood - that's an implementation detail. Furthermore, in this case, your function doesn't even try to access variables from the outer scope, and this would break if it did.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: