- C is straightforward to compile into fast machine code...on a PDP-11. Its virtual machine does not match modern architectures very well, and its explicitness about details of its machine mean FORTRAN compilers typically produce faster code. The C virtual machine does not provide an accurate model of why your code is fast on a modern machine. The social necessity of implementing a decent C compiler may have stifled our architecture development (go look at the Burroughs systems, or the Connection Machine, and tell me how well C's virtual machine maps to them).
- C's standard library is a joke. Its shortcomings, particularly around string handling, have been responsible for an appalling fraction of the security holes of the past forty years.
- C's tooling is hardly something to brag about, especially compared to its contemporaries like Smalltalk and Lisp. Most of the debuggers people use with C are command line monstrosities. Compare them to the standard debuggers of, say, Squeak or Allegro Common Lisp.
- Claiming a fast build/debug/run cycle for C is sad. It seems fast because of the failure in this area of C++. Go look at Turbo Pascal if you want to know how to make the build/debug/run cycle fast.
- Claiming that C is callable from anywhere via its standard ABI equates all the world with Unix. Sadly, that's almost true today, though, but maybe it's because of the ubiquity of C rather than the other way around.
So, before writing about the glories of C, please go familiarize yourself with modern FORTRAN, ALGOL 60 and 68, Turbo Pascal's compiler and environment, a good Smalltalk like Squeak or Pharo, and the state of modern pipelines processor architectures.
I'm pretty ignorant about this stuff, so please don't think I'm trolling.
I'm confused when you speak of a virtual machine with regard to C... can you explain what you mean by this?
I had to wikipedia the Burroughs machine. I guess the big deal is that it's a stack machine? It looks very interesting and I plan to read more about it. But I guess I don't understand why that is a hindrance to C.
The JVM is a stack machine, isn't it?
btw, I haven't read the article yet. It's my habit to check comments first to see if the article was interesting, and seeing your comment made me want to reply for clarification.
The Burroughs was a stack machine, but that's only the beginning. Look at how it handled addressing hunks of memory. Bounds checked memory block references were a hardware type, and they were the only way to get a reference to a block of memory. So basically, null pointers didn't exist at the hardware level, nor out of bounds writes to arrays or strings. Similarly, code and data were distinguished in memory (by high order bits), so you couldn't execute data. It simply wasn't recognized as code by the processor. Also interesting, the Burroughs machines were built to use a variant of ALGOL 60 as both their systems programming language (as in, there wasn't an assembly language beneath it) and as their command language. The whole architecture was designed to run high level procedural languages.
C defines a virtual machine consisting of a single, contiguous block of memory with consecutive addresses, and a single core processor that reads and executes a single instruction at a time. This is not true of today's processors, thanks to multicores, multiple pipelines, and layers of cache.
No, C does not define a single contiguous block of memory with consecutive addresses. It _does_ qualify that pointers are scalar types, but that does not imply contiguity or consecutive addresses (with the exception of arrays)
There is no requirement in C that you be able to execute data.
The "abstract machine" of C explicitly does _not_ make reference to the memory layout. (cf 5.1.2.3 of the spec)
It also makes no reference to the number of cores, and order of execution is not one at a time, but limited by sequence points.
That's the whole point of C - it is very loosely tied to actual hardware, and can accomodate a wide range of it, while still staying very close to actual realities.
Edit: Please don't take this as a "rah, rah, C is great" comment. I'm well aware of its shortcomings. I've spent the last 20+ years with it :)
I would argue that C's problem is not that it's too strictly defined, but that it's too poorly defined. An in-depth look into all the cases of undefined behavior in C will show what I mean.
You want to really understand C? Read this[0]. John really understands C.
Can't upvote this enough. Well, except that I'd replace "poorly" with "vaguely". "Implementation-defined behavior" is there for very good reasons in every single case it's there.
Sidenote: With John's name, I'd be tempted on a daily basis to change the last two letters of my name to a single 'x' ;)
You can use any separator you want in an s/../../ expression, not just /, in this case the separator is _ (this technique allows you to use / without creating a "picket fence": s/r\/x/r$\//).
I'd argue that there's a big distinction between C as described in the standard and C as actually used in real-world code, and the latter has much stricter semantics and is harder to work with. A C compiler that follows the standard but not the implied conventions won't do well.
For example, take NULL. Even on a machine with no concept of NULL, you could easily emulate it by allocating a small block of memory and having the pointer to that block be designated as "NULL". This would be perfectly compliant with the standard, but it will break all of the code out there that assumes that NULL is all zero bits (e.g. that calloc or memset(0) produce pointers whose values contain NULL). Which is a lot of code. I'm sure that many other examples can be found.
"C defines a virtual machine consisting of a single, contiguous block of memory with consecutive addresses"
This is 100% false. The C standard makes no mention whatsoever of memory. I don't know much about the burroughs machine, but it sounds like it would map very well to the C virtual machine:
C permits an implementation to provide a reversible mapping from pointers to "sufficiently large integers" but does not require it.
A pointer to an object is only valid in C (i.e. only has defined behavior) if it is never accessed outside the bounds of the object it points to.
Converting between data pointers and function pointers is not required to work in the C standard either.
C does require that you have a NULL pointer that has undefined behavior if you dereference this, but this could be trivially done by the runtime by allocating a single unit of memory for it.
>C defines a virtual machine consisting of a single, contiguous block of memory with consecutive addresses, and a single core processor that reads and executes a single instruction at a time. This is not true of today's processors, thanks to multicores, multiple pipelines, and layers of cache.
Which is true, for a rather stretched definition of "virtual machine"(which falls apart at the kernel level, because it's pretty hard to work with a machine's abstraction when you're working directly on the hardware).
The problem with the virtual machine comparison is that C doesn't mask ABI access in any meaningful way. It doesn't need to, since it's directly accessing the ABI and OS. So the argument that C isn't multithreaded is rather shortsighted, because C doesn't need that functionality in the language. It's provided by the OS.
FYI when discussing the ISO C standard the term "virtual machine" is well understood to be the abstracted view of the hardware presented to user code. Things well defined in it are portable, things implementation defined are non-portable, and things undefined should be avoided at all costs.
As a C programmer this is like watching virgins attempt to have sex. Normal people just write some code which does some sh*t and that's OK. We don't need to deeply reflect on whether it's cache optimal, because that will change next week. Just good clean C. When did that become a strange thing to do?
Is there such a thing? It seems like every C program, even ones that are praised as being excellently written, are a mess of pointers, memory management, conditional statements that check for errors, special return value codes, and so forth.
To put it another way, look at the difference between an implementation of a Red/Black Tree in C and one written in Lisp or Haskell. Not only are basic things overly difficult to get right in C, but C does not become any easier as problem sizes scale up; it lacks expressiveness and forces programmers to deal with low-level details regardless of how high-level their program is.
"Turns out clear thought in any language is the main thing."
No, the ability to express your thought clearly is the main thing -- and that is why languages matter. If your code is cluttered with pointer-juggling, error-checking conditional statements, and the other hoops C forces you to jump through, then your code is not clear.
Try expressing your code clearly in BF, then get back to me about this "languages don't matter as long as your have clear thought" philosophy.
I'm a professional pentester and I have been a C programmer for over well over 5 years, but I acknowledge that my C is probably still pretty bad :) how about you? :)
P.S: now I have figured you out (on a very basic level of course) and I have a lot of respect, but nonetheless, let's play :)
I've been writing kernel code in C for about 8 years, including a hardware virtualization subsystem for use on HPCs. I used to teach Network Security and Penetration, but I lost interest in security and moved on to programming language development.
My code, in any language, is full of bugs. The difference is that in C my bugs turn into major security vulnerabilities. C is also a terrible language in that you never write C -- you write a specific compiler's implementation of C. If a strict optimizing compiler were to get a hold of any C I've ever written, I'm sure it would emit garbage. All the other languages I write code in? Not so much.
Based on that I will buy you your beverage of choice at any conference you choose :)
P:S: I've probably written commercial stuff you work with and also I don't give a shit if you give a shit, if you see where I am coming from. I have a pretty good idea of what the compiler will do and I will be pissed off if it doesn't do that. It normally does.
Thanks. I hope you didn't take my first comment as an insult.
What I meant by that is C is not just something you sit down with after skimming the specification and "bang out." There are years of community "inherited knowledge" on how to write C so it doesn't result in disaster. The very need for these practices exemplifies the flaws in C as a language -- by the very nature of working around these vulnerabilities, you acknowledge that they are vulnerabilities. Thus, if one doesn't see C's issues then one is doomed to C's mistakes (this sentence is very strange when read out loud).
I think that your situation is pretty different from most programming projects in that you are way closer to the machine than most people need to be. Also, you are working on an OS which is particularly sensitive to compiler nuances. I would have a hard time imagining different compilers spitting out garbage with the standard "hello world". Now the almost mandatory caveat: I know that C has its flaws, but not all programming projects are the same. Projects which are not like your will have the "You write a specific compiler's implementation of C" problem in way smaller doses than you (possibly to the point of not having them at all, like hello world).
I'll have to read more about the memory references to get a feel for that.
However it speaks of a compiler for ALGOL... it was compiled down to machine instructions. Assembly is just a representation of machine instructions, so I don't see how it can be said to not have an assembly language.
Maybe nobody ever bothered to write an assembler, but that doesn't mean that it somehow directly executes ALGOL.
Thanks for your replies, you have given me some food for thought.
> However it speaks of a compiler for ALGOL... it was compiled down to machine instructions. Assembly is just a representation of machine instructions, so I don't see how it can be said to not have an assembly language.
In this sense, you're completely right. But I think that people who grok the system mean something a bit different when they say it doesn't have an assembly language. (Disclaimer: I have no firsthand experience with Burroughs mainframes.)
The Burroughs system didn't execute Algol directly, true. But, the machine representation that your compiled down to was essentially a high-level proto-Algol. It wasn't a disticnt, "first-class citizen". It was, if you like, Algol "virtual machine bytecode" for a virtual machine that wasn't virtual.
If you're writing in C, or some other higher-level programming languages, there are times when you want more fine-grained control over the hardware than the more plush languages provide. That's the time to drop down to assembly code, to talk to the computer "in its own language".
The Burroughs mainframes had nothing analogous to that. The system was designed to map as directly to Algol as they could. It's machine language wasn't distinct from the higher-level language that you were supposed to use. To talk to a Burroughs system "in its own language" would be to write a rather more verbose expression of the Algol code you'd have had to write anyway, but not particularly different in principle.
So, I guess the answer to whether or not the Burroughs systems did or did not have an assembly language is a philosophical one. :P
C doesn't care for fancy terms like VM, multicore, threads, ... But you can always make libary and implement what you need. This approach has advantages, for example you can share memory pages between processes, because that kind of stuff are part of hardware/OS, not C language. It would be stupid to implement it directly in C language. You will now say that it is reason why C is bad, i say it is reason why C is so popular all these years.
> C defines a virtual machine consisting of a single, contiguous block of memory with consecutive addresses, and a single core processor that reads and executes a single instruction at a time. This is not true of today's processors, thanks to multicores, multiple pipelines, and layers of cache.
This type of machine has become so ubiquitous that people have begun to forget that once upon a time, other types of machines also existed. (Think of the LISP Machine)
I'd say "abstract virtual machine." You are just confusing people. "Virtual machine" most commonly refers to a discrete program that presents a defined computational interface that everyone calls the virtual machine. This VM program must be run independently of the code you wrote.
For C there is no such virtual machine process. The "virtual machine" for C is abstract and defined implicitly.
Second this; in all my years (granted, not a lot, but enough), this is the first time I've heard anyone claim that C has a virtual machine. You can hem and haw and stretch the definition all you want, but when it compiles to assembler, I think that most reasonable people would no longer claim that's a "virtual" machine.
Edit: if you want to argue that C enforces a certain view of programming, a "paradigm" if you will (snark), then say that. Don't say "virtual machine", where most people will go "what? when did C start running on JVM/.NET/etc?".
Given the way that LLVM has come onto the scene, I'm not sure I'd agree. C defines assumptions in the programming environment and does not guarantee that it at all resembles the underlying hardware. You are never coding to the hardware (unless you are doing heinous magic), you're coding to C. That's a "virtual machine" to me.
The concept of C as a virtual machine isn't new (I first heard it around 2006 or so? I don't think it was new then) and it's much more descriptive than referring to its "model of computation".
The common definition of a process virtual machine is that it's an interpreter that can be written to that essentially emulates an OS environment, giving abstracted OS concepts and functionality. This aids with portability. Another concept of virtual machines in general is, for lack of a better term, sandboxing. You're limited to only the functionality that the VM provides.
C goes halfway with that. You generally don't need to care about most OS operations if you're using the standard library(which abstracts most OS calls), but you definitely do need to care about the underlying OS and architecture if you're doing much more than that. Also, simple C definition doesn't allow for threads or IPC, both of which are provided by the POSIX libraries. You're also allowed to directly access the ABI and underlying OS calls through C.
The best example of C not really having a VM is endianness. If C had a "true" virtual machine, the programmer really shouldn't need to be aware of this. But everyone that's written network code on x86 platforms in C knows that you need to keep it in mind. Network byte order is big endian, but x86 is little endian, so you need to translate everything before it hits the network.
EDIT: I think LLVM is somewhat of a red herring in this context. Realistically, unless you're writing straight assembly, there's nothing stopping anyone from writing a VM-based implementation for any language. The problem with C and the other mid to low level languages is that if you're writing the VM, you need to provide a machine that not only works with the underlying computational model, but also provide abstractions for all the additional OS-level functionality that people use.
So C could definitely become a VM-based language, especially if the intermediate form is flexible enough.
"The common definition of a process virtual machine is that it's an interpreter that can be written to that essentially emulates an OS environment, giving abstracted OS concepts and functionality."
Is it? I have seen "virtual machine" used to describe the process abstraction and to describe the IR in compilers (hence "Java Virtual Machine"), and to describe the Forth environment (similar to compilers).
For the same reason that if you put a hunk of chocolate in the oven and called it "hot chocolate," people would be confused that it's not a warm beverage of chocolate and milk.
That is, the phrase "virtual machine" is usually assumed to be the name for a piece of software that pretends to be some particular hardware. It is less commonly used to mean a "virtual machine", that is, not a noun unto itself, but the adjective virtual followed by the noun machine.
The term "virtual machine" is already pretty overloaded. This isn't referring to virtualized hardware in the VMWare sense or a language/platform virtual machine in the JVM sense. Rather, it's talking about how C's abstraction of the hardware has the Von Neumann bottleneck baked into it, so it clashes with fundamentally different architectures like the Burroughs 5000's.
The C language specification [0] defines an abstract machine and defines C semantics in terms of this machine. 5.1.2.3 §1:
> The semantic descriptions in this International Standard describe the behavior of an
abstract machine in which issues of optimization are irrelevant.
"Virtual machine" in this context refers to the computation model of the language. In C, that model is essentially a single-CPU machine with a big memory space that can be accessed with pointers (including a possibly disjoint memory space for program code).
Other models are possible; for example, lambda calculus and combinator logic are based on a model where computation is performed by reducing expressions, without any big memory space and without pointers. Prolog is based on a model where computation is performed by answering a query based on a database of rules. These are all "virtual machines" -- the realization of these computation models is based on compiling a program for a specific machine. It is not different with C; C just happens to use a model that is very similar to the real machine that a program executes on (but it is not necessarily identical e.g. you probably do not have so much ram that any 64-bit pointer would correspond to a real address on your machine).
Because C compilers write out to native code. It may help to think that C is the virtual machine language (as well as its specification for the virtual machine). This concept has been extended by things like Clang, that transform C to a somewhat more generic underlying language representation (LLVM bytecode) before compiling to native code.
You can ahead-of-time compile Mono code to ARM; that doesn't mean it's not defining a virtual execution environment.
I'm also a novice on low-level stuff, but if I had to guess...
I'd guess that the virtual machine of C pertains to the addressing and the presentation of memory as a "giant array of bytes". Stack addresses start high and "grow down", heap addresses start low. These addresses need not exist on the machine. For example, two running C processes can have 0x3a4b7e pointing to different places in machine memory (which prevents them from clobbering each other).
Please, someone with more knowledge than me, fill me in on where I'm right and wrong.
C does not require the presentation of memory as a "giant array of bytes"---certainly when you have a pointer, it points to an array of bytes (or rather, a contiguous array of items of the pointer type) but that's about it. The stack does not have to start high and grown down (in fact, the Linux man page for clone() states that the stack on the HP PA architecture grows upward) and the heap doesn't necessarily start low and grow up (mmap() for instance).
You are also confusing multitasking/multiprocessing with C. C has no such concept (well, C11 might, I haven't looked fully into it) and on some systems (like plenty of systems in the 80s) only one program runs at a time. The fact that two "programs" both have a pointer to 0x3A4B7E that reference two physically different locations in memory is an operating system abstraction, not a C abstraction.
C pointer aliasing defeats certain compiler optimizations that can be made in other languages, and is frequently brought up in C vs FORTRAN comparisons. I think that's probably what the GP had in mind.
C99 includes restricted pointers, but support is a bit spotty. Microsoft's compiler (which is of course really just a C++ compiler) includes it as a nonstandard keyword, too.
Well that is how memory, the hardware we have and also all normal operating systems work, but if you want to discuss other stuff we can do that too :) Try serious debugging and you will find that all your preconceptions are confirmed, yet it's still hard to know WTF is going on.
The "memory is just a giant array of bytes" abstraction hasn't been true ever since DRAM has existed (because DRAM is divided into pages), ever since caches were introduced, and certainly isn't true now that even commodity mulch-processors are NUMA with message-passing between memory nodes.
Look if we want to be super anal about shit all memory is slowly discharging capacitors with basically random access times based on how busy the bus circuity is with our shit this week. It turns out that memory is really complicated stuff if you look at it deeply, but the magic of modern computer architecture is that you get to (hopefully) keep your shelf model for as long as you can. If you were to try to model actual memory latency: here's a shortcut: you can't. That's why everyone bullshits it.
True, and this is my biggest problem with writing optimized code in C -- it takes a lot of guessing and inspecting the generated assembler and understanding your particular platform to make sure you're ACTUALLY utilizing registers and cache like you intend.
If there were some way of expressing this intent through the language and have the compiler enforce it, that'd be fantastic :)
That said, there's really not a better solution to the problem than C, just pointing out that even C is often far less than ideal in this arena.
Madhadron, you make a lot of claims but provide no detail. Also using terms like "virtual machine" with respect to C is plainly ridiculous and a case of bullshit baffles brains.
Turbo Pascal vs C? Really? In its time Turbo Pascal was an amazing piece of software but in the grand scheme of things it is a pimple compared to the whale that is C. Please compare all software written in Turbo Pascal as opposed to C if you have any doubts. The same goes for Smalltalk, Lisp, Algol 60/68. All great products/languages but very niche.
Fortran can be faster than C in some areas but again it is a niche language.
I could go on a lot more but quite frankly I don't think your post merits much more discussion and is borderline trollish.
> Madhadron, you make a lot of claims but provide no detail. Also using terms like "virtual machine" with respect to C is plainly ridiculous and a case of bullshit baffles brains.
C as a very thin virtual machine is a common conception and not an incorrect one--C runs on many systems with noncontiguous memory segments but presents it as a single contiguous space, for example. The idea of C as a virtual machine is much of the basis of LLVM, and to the best of my knowledge I've never worked on a computer where C represented the underlying hardware without significant abstractions.
If you're going to accuse somebody of trolling, you should know what you're talking about first.
> C as a very thin virtual machine is a common conception and not an incorrect one
I have worked extensively in the past with C and have never heard it referred to as a virtual machine, thin or otherwise. I understand that the OP probably means "computation model" or something similar but felt that the use of the phrase "virtual machine" was a bit on the bombastic side and that in addition to the general tone of the post made me think the post was borderline trolling.
BTW. I would be quite happy to be proved wrong about C commonly being referred to as a thin virtual machine - what books/literature refer to C in this way?
"I have worked extensively in the past with C and have never heard it referred to as a virtual machine, thin or otherwise."
True. Usually other terms are used; for instance, "memory model". If you google that you'll find some things. As you read them, notice that they may or may not match the hardware you are actually running on, and given the simplicity of C's model, nowadays almost never does.
C is a low-level language that lets you close to the machine, and even lets you drop into assembler, but it is true that it is enforcing/providing a certain set of constraints on how the library and code will act that do not necessarily match the underlying machine. It may not be a "virtual machine", depending on exactly how you define that term, but the idea isn't that far off.
Also, this is a good thing, not a criticism. If it really was just a "high level assembler" that provided no guarantees about the system, it would be as portable as assembler, which is to say, not at all.
For a much more clear example of this sort of thing in a language similarly thought to be "close to the metal", look at C++'s much more thorough specification of its memory and concurrency model, and again notice that this model is expected to be provided everywhere you can use a C++ compiler, regardless of the underlying hardware or OS. It is, in its own way, a virtual machine specification.
C as a very thin virtual machine is a common conception and not an incorrect one--C runs on many systems with noncontiguous memory segments but presents it as a single contiguous space, for example.
C does no such thing. That is the job of the MMU, and C has nothing to say about it. You're going to have a hard time convincing anyone that a language without a runtime somehow has a VM. That's nonsense.
It doesn't matter, his description is still the same kind of nonsense in line with having a VM. I get the feeling that he thought the thing he meant to say is actually the thing he said.
Sorry, I don't see that. What he said is clear to me: The C virtual machine does not provide an accurate model of why your code is fast on a modern machine.
He's saying that the abstraction that C provides has deviated significantly from the underlying hardware. Considering the kinds of hoops that modern processor go through, this is a valid point.
And the above should answer the sibling's question, too.
He also said "C runs on many systems with noncontiguous memory segments but presents it as a single contiguous space." That is 100% false. In C, pointers to different objects are different. They're not even comparable. There is no concept of contiguous memory beyond that of a single array. Two arrays are not contiguous.
I'm not aware of an implementation that does that, and it's not required by the standard. A more common solution is to just limit the size of an array to the size of a segment (sizet max would also then be the segment size). If you think the C standard requires crossing segments, what section, what words?
Try reading the part I quoted. I think you're conflating my criticisms with that of someone else. Yes, C abstracts the hardware, and always has. That's the point I made in the first place: its memory abstraction is a prerequisite. Hardware and operating systems provide that to C, in direct contradiction to the comment I replied to (which claims C provides it.)
The term "virtual machine" has several meanings, only one of which is what you're thinking of (something like the JVM or CLR). In the context Madhadron's using it, "virtual machine" means the abstraction of the hardware that the C compiler presents to the running program. E.g. it presents an illusion of sequential execution even on machines such as Itanium that are explicitly parallel at the hardware level. It presents the illusion of memory being a uniform array of bytes, when on a modern machine it's actually composed of several disjoint pieces.
CPU's and operating systems go to substantial lengths to present programs with a "C machine" on modern hardware. Your multi-core CPU doesn't look like a PDP-11, but it uses reorder buffers to make execution look sequential and cache coherence built on message passing to make memory look uniform.
Also using terms like "virtual machine" with respect to C is plainly ridiculous and a case of bullshit baffles brains.
Note that the C language standards are written in terms of an "abstract machine". For example, "When the processing of the abstract machine is interrupted by receipt of a signal, only the values of objects as of the previous sequence point may be relied on" from 5.1.2.3.
Seems to me that the GP was referring to the C abstract machine but used a similar, more common phrase by mistake.
Granting that, I can't say I agree with his point: C has probably ran on more machines than any other language. As an abstraction over machines it is, if not perfect, clearly good enough.
Maybe because he uses C and have never heard someone refering to VM and C in same sentence. I'm also interested in this because i also don't get wtf is "C VM". He could be refering to layer below which includes hardware and OS combo. You know, memory pages, translations, ... Is he talking about that? :-)
Programming languages are independent of their canonical implementation.
Usually the usage of a compiler, interpreter or JIT as default implementation is a matter of return of investment for the set of tasks the language is targeted for.
I believe he actually intends to talk about the C abstract machine, which is a well-defined technical term (and is actually used extensively in the C standard).
> So, before writing about the glories of C, please go familiarize yourself with modern FORTRAN, ALGOL 60 and 68, Turbo Pascal's compiler and environment, a good Smalltalk like Squeak or Pharo, and the state of modern pipelines processor architectures.
All true, but how are you going to implement a single project using just the best parts of FORTRAN, ALGOL 60, Turbo Pascal and Smalltalk?
Yes, and that's the tragedy of it all...and the real reason everyone uses C. It's not that C is a well designed language appropriate for highly optimized compiling, with wonderful tooling and an elegant standard library that takes care of the difficult or insecure or repetitive tasks.
It's that I used to be able to sit down at any Unix machine in the world, shove some code into an editor, call some system libraries, get it to compile, and maybe even step through it in a debugger, and go home at the end of the day, and I could do that because that's what its designers were doing with it, so they made sure that all of it at least functioned at a good enough level to get through the day. Not always anymore, since a lot of Linux distributions no longer install the C compiler by default.
The fact that is it "not always anymore" may be a blessing in disguise. The focus of attention in the last decade or so has been all around heavier, VM-centric languages like Java or Ruby, but opportunities for new systems programming technology are coming up more often these days.
Fortran compilers sometimes produce faster code for a limited subdomain of problems because fortran's language semantics are less restrictive: all arrays are assumed to not alias and the compiler has more freedom to reassociate arithmetic and apply other unsafe arithmetic transformations that are disallowed by C's stricter model.
C compilers let you specify that you want similarly relaxed semantics via compiler flags and language keywords; however, they also allow the programmer to have stricter behavior when necessary, e.g. allowing one to operate on memory that may actually alias and still get correct behavior.
The article argued that C was effective in practice, not that it was better than X in some other sense. Would you prefer to write production code in FORTRAN, ALGOL 60 and 68, Turbo Pascal, or Smalltalk to writing it in C today?
Seriously? Actual production code in ALGOL in 2013? If I ever had to deal with that code, I think I'd feel about like you would have if I wrote this comment in Aramaic because I prefer it to English on numerous grounds. I'd very much prefer the extremely unpopular Racket which is at least alive to some degree.
I think the author's criteria for "effectiveness" are very much different from yours, hence the different conclusions.
I've read this comparison between Go and ALGOL-68, and I did enjoy it. Here are quotes from it that explain the difference between "good" and "effective" - exactly the point that I was making here (the article we're discussing claiming that C is effective, not that it is good in whatever sense):
"So why am I comparing Go with a language 41 years older? Because Go is not a good language."
But...
"I'm not suggesting that everyone drop Go and switch to Algol-68. There's a big difference between a good programming language and an effective programming language. Just look at Java. Go has a modern compiler, proper tools, libraries, a development community, and people wanting to use it. Algol-68 has none of these. I could use Go to get proper work done; I probably couldn't in Algol-68. Go is effective in a way that Algol-68 isn't."
Note that the author of a parent comment actually claimed to prefer ALGOL-68 to C for writing production code today, in practice.
modern FORTRAN is absolutely horrific, to the extent that F77 is far more popular than more recent flavors (like F90 or F95). It suffers from the same types of problems that the author identifies in C++ and Java.
As for standard library, I learned from K&R and have never been surprised by the library (insofar as I can reasonably predict what will happen, given the guidelines). I cannot say that about the C++ standard library (don't get me started on STL) or Java
And as for debugging, it took me years to become adept at parsing STL error messages (wading through cruft like basic_string to get to the essence of the error).
however I do agree that Borland and turbo pascal are exemplary.
The overarching point in the erlang example is that the bug stemmed from something in erlang itself. C and the standard library are sufficiently documented and tested that there is no ambiguity. (Regarding the natural counterpoint on indeterminate expressions like I++ + ++I, the language specification is also clear on the nature of indeterminate forms)
I thought the major point is that it's a "high level" language, high enough that a compiler can do serious optimizations. Read Fran Allen's interview in Coders at Work, she despairs of writing efficient compilers/programs in C because it's so low level.
Now, sometimes you need that low level access, but C is used way beyond those niches.
FORTRAN 77 is far more popular because it took so long for a GNU FORTRAN 9x compiler to appear, and meanwhile there are millions of lines of FORTRAN 77 in enormous libraries.
The problem with the standard library is that it is tiny and its basic types are often ill designed. Certainly it's fairly consistent, and so fairly unsurprising, but that's not the argument.
It also has to do with the type of people that program in FORTRAN. They often have backgrounds more grounded in Maths or Physics than computers. I don't think the GNU compiler had much to do with it because people using FORTRAN are more likely to be using something else (we use Solaris Studio where I work).
The C 'virtual machine' and its effectiveness is mostly at polarizing otherwise rational individuals.
The use of the term 'virtual machine' for C is appropriate in the case of an individual wanting to purposefully show that he has a 'better' handle on the semantics, and that a mere mortal surely would not understand his glorious knowledgs.
Can you elaborate on your first point a bit? How, specifically, does C's machine model not fit current architectures well, aside from multiple cores. How is Fortran's model better?
C is based on a simplistic view of the computer as a turing machine (or von neumann machine, if you'd prefer).
Since the 70s or 80s, CPUs have gotten a lot faster while memory access has only gotten incrementally faster. This means that CPU manufacturers have put an increasingly sophisticated caching system on the CPU die to speed up access to main memory contents.
C's simple model allowing pointer arithmetic, aliased pointers that overlap to the same memory, etc, mean that compilers can't produce the optimal cache behavior on the CPU, because they can't reason definitively about when a given region of memory will be accessed based on the variables being accessed (pointer might have been modified or might overlap another one). They also have trouble invoking things like SIMD instructions which do math quickly on vectors. Fortran, with stricter rules about arrays, is more conducive to this.
Pointer aliasing can cause additional problems with today's multiple dispatch CPUs, which are nothing like the CPUs of the 80s. Because pointer aliasing ties the compiler's hands, compilers may have trouble reordering code to maximize a single core's instruction level parallelism.
Aliasing is not really an architecture specific concern.
Furthermore the vast majority of languages allow free aliasing, differing from C in what things pointers are allowed to point at and not in the number of pointers which may point at them.
I agree. C is outdated and does not represent current machines very well. Nor does it benefit from fruits of years of programming language research. Different layers of cache? Multiple cores? Stream processing? A type-system that is the equivalent of stones and sticks in the age of guns? The joke of "macro"-system that is a miniscule improvement over sed expressions in your Makefile?
However, it is here to stay, just like Inch, Foot, and Fahrenheit. Not that the alternatives are worst, just the cost of switching over and training a whole lot of people is back-breaking.
Note that most unfortunately, it is being taken over by javascript (false == '0'), which is a far worse language in many aspects.
> - C is straightforward to compile into fast machine code...on a PDP-11. Its virtual machine does not match modern architectures very well, and its explicitness about details of its machine mean FORTRAN compilers typically produce faster code...
Nothing is straightforward to compile in to efficient machine code, and FORTRAN is not at all easier than C. Many compilers (e.g. GCC) use the same back-end for FORTRAN, C and other source languages. FORTRAN's clumsy arrays are a little nicer to the compiler than C's flexible pointers but the "restrict" keyword solves this.
> - C's tooling is hardly something to brag about, especially compared to its contemporaries like Smalltalk and Lisp. Most of the debuggers people use with C are command line monstrosities.
Yes, C debuggers may be a little clumsy compared to modern source level debuggers. But you can plug gdb in to almost anything, from a regular user space application, a microcontroller chip like Arduino, a kernel image running in QEMU, remotely debug Win32 code on Linux, a live CPU with a debugger dongle, etc.
> - Claiming a fast build/debug/run cycle for C is sad. It seems fast because of the failure in this area of C++. Go look at Turbo Pascal if you want to know how to make the build/debug/run cycle fast.
What's the point in compiling slow code quickly? Turbo Pascal was not really a good compiler, especially if you'd compare the output to a modern compiler.
Also take a look at the progress going on with LLVM/Clang/LLDB, they're working on better debugging and REPL-like environments for C.
Is there any way for software, even in assembly, to make intelligent use of on-cpu ram caches? I thought they were completely abstracted by the hardware.
In addition, many CPUs provide prefetch instructions to make sure data is in cache before it's used, or cache-skipping loads and stores to prevent polluting cache with data that's only ever touched once.
C is straightforward to compile on pretty much every single CPU out there. A C compiler is required to certify many very common CPU's before they go into manufacturing, in fact. You cannot go to silicon, in many fabs, unless you have demonstrated the CPU can handle code; most of the code doing all this testing, is in C (lots of Assembly too). C is a front-line language, at the fabrication level.
- C's standard library may be sparse, but the diversity of other libraries out there means that you can do your own thing, and have complete control over what gets packed into your final binary image. This is a given, because it is powerful. A scan of /usr/lib,/usr/local/lib, or a search for .so/.dylibs in /Applications, for example, gives me, quite literally, nothing I can't like to with a C compiler.
-My fast edit/build/run/debug cycle goes like this: hit the hot-key in my editor, watch output, done. Of course, I also have periods where I must sit and compile a larger collection of libraries; usually at point breaks in my development cycle, but this is all taken care of by my build server, which runs independently of development servers. With C, I've been able to easily scale my build/release/developer binary packaging requirements from "needs to be instant interaction between new code and running tests" to "needs an hour to generate keys and dependencies and be prepared for a solid few years of use". C's build time scales according to the productivity of the developer.
-C is highly productive, if you do not trip over key things. Yes, building your own libs package should be a core part of professional C competency; no decent C developer worth their salt, who can't add a few lines of code to a multi-million-line codebase and figure out how to not sit there waiting for the kettle to boil while it builds for testing, should be allowed in any production environment. A proper C Makefile can build entire operating systems-worth of applications; for domain-specific apps, I hardly even notice the build time any more, in between edits and updates. C code is built so fast and integrated so well in repository/testing tools, that in my opinion the system is so well integrated its not relevant.
Now, if you need a prepackaged thing, and don't want to hassle with needing to know how to assume an iron grip over the dependencies issue, then of course C is a big hassle. Other languages have their time, and place.
You know what else I like about C? Lua. With C and Lua, and a careful selection of optimal libraries, you can build a development environment beyond compare in any other realm. The pure joy of understanding how a table-driven application goes from:
work_q = {} and work_q = {
{name="empty trash", when=os.now(), msg="The system trash is emptied.."},
{name="copy file", when=os.now()+1000, msg="A backup is being performed..", cmd="tar -cvf /mnt/backups /home/*"},
}
.. to a working C-layer on multiple platforms, all functioning exactly the same way, all working from a single binary .. well then, this joy is greater than many other language/systems of development.
Basically C is where it is because of a combination of timing and market forces. Market forces are usually underestimated by theoretical computer scientists, but they have a much stronger say in the success of a language than anything else.
However I still believe knowing C inside out is an essential skill, and as bad as it is, it's better than most of the alternatives for "low level" code. They usually either try too hard to be different or don't offer enough to substitute C and all the traction, coders, literature and codebase around it. And, as you mentioned, the fact that architectures became historically forced to play nice with C's machine abstraction.
Could you give a few links to papers on Burroughs systems and the Connection Machine? Is this article on LtU [1] and Daniel Hill's Connection Machine paper and book the sources I ought to look at?
Titles to some papers or some comparisons on a mailing list would be awesome to have if you know any off the top of your head.
> The social necessity of implementing a decent C compiler may have stifled our architecture development
the idea that specific requirements of c had started driving architecture development blew my mind the first time i came across it. it is still a point that does not get discussed nearly enough at a popular level.
None of what you said supports your initial claim that the author is ignorant. In fact, even if we pretend that everything you said is true, none of it contradicts the article, or the authors conclusions. Yes, C has lots of areas where other languages are better. And yet, it is still the most practical language to use because of the combination of things that it does well.
Only in extremely limited contexts: low-level code for operating systems that happened to have been written in C. Otherwise, there is a better language for pretty much every use-case of C.
At least SBCL will generate efficient machine language for bit vector manipulations if you tell it the sizes of the vectors (or if it can infer that information). It might just be a matter of opinion, but I would say that Lisp beats the pants off C in terms of programming; you are almost never going to have to keep track of pointers or memory allocations, you have far less undefined behavior to deal with, and it is generally a more expressive language. SBCL and CMUCL also support a non-standard feature that is similar to inline assembly language in C, the VOP system, that allows you to utilize obscure instructions or change how your code will be generated, if there is some reason for doing that (e.g. if your encryption library will use AESNI).
Of course, since you said "library," I assume you meant for this to be used in other programs, possibly programs not written in Lisp. Unfortunately, SBCL's support for that is still rough; commercial Lisp compilers may have better support for it. Of course, if you were targeting an OS written in Lisp, things would probably be different -- my view is that C's popularity is mostly a result of how many systems were written in C i.e. how much legacy code there is, and that were it not for that momentum C would be written off as an inexpressive and difficult to work with language.
"You say that like it is an accident that just about every major OS today is written in C or C/C++"
Is there some technical feature of C or C++ that makes those languages good for writing OSes, or that made OSes written in those languages overwhelmingly successful? You might argue that the simplicity of a C compiler made Unix successful because it helped with portability, but C is by no means unique in this regard -- a Lisp compiler can be equally simple, and can be used to bootstrap a more sophisticated compiler. Really, Unix won because the only real competition Unix vendors faced in the minicomputer and workstation market came from other Unix vendors; C was riding on the coattails of Unix. One would be hard-pressed to argue that Windows won because C++ is a great language, especially considering how Windows was plagued with bugs and security problems during its rise to prominence in the 90s (many of which resulted from bad pointer mechanics, which is what happens when you use a language that requires you to juggle pointers).
One would be hard-pressed to argue that Windows won because C++ is a great language
I'm not arguing that. I'm not even saying I know precisely why C or C/C++ is good. I'm just saying, when every major OS is written in it, can you really dismiss it so easily?
No, there isn't. What would you write a database in? Or an SMTP server, or IMAP, or HTTP, or whatever else. There's a reason all that stuff is almost always done in C.
A more realistic view of C:
- C is straightforward to compile into fast machine code...on a PDP-11. Its virtual machine does not match modern architectures very well, and its explicitness about details of its machine mean FORTRAN compilers typically produce faster code. The C virtual machine does not provide an accurate model of why your code is fast on a modern machine. The social necessity of implementing a decent C compiler may have stifled our architecture development (go look at the Burroughs systems, or the Connection Machine, and tell me how well C's virtual machine maps to them).
- C's standard library is a joke. Its shortcomings, particularly around string handling, have been responsible for an appalling fraction of the security holes of the past forty years.
- C's tooling is hardly something to brag about, especially compared to its contemporaries like Smalltalk and Lisp. Most of the debuggers people use with C are command line monstrosities. Compare them to the standard debuggers of, say, Squeak or Allegro Common Lisp.
- Claiming a fast build/debug/run cycle for C is sad. It seems fast because of the failure in this area of C++. Go look at Turbo Pascal if you want to know how to make the build/debug/run cycle fast.
- Claiming that C is callable from anywhere via its standard ABI equates all the world with Unix. Sadly, that's almost true today, though, but maybe it's because of the ubiquity of C rather than the other way around.
So, before writing about the glories of C, please go familiarize yourself with modern FORTRAN, ALGOL 60 and 68, Turbo Pascal's compiler and environment, a good Smalltalk like Squeak or Pharo, and the state of modern pipelines processor architectures.