I've been wanting a capability based security model for years. Argued about it here in fact. Capabilities are kind of an object pointer with associated permissions - like a unix file descriptor.
We should have:
- OS level capabilities. Launched programs get passed a capability token from the shell (or wherever you launched the program from). All syscalls take a capability as the first argument. So, "open path /foo" becomes open(cap, "/foo"). The capability could correspond to a fake filesystem, real branch of your filesystem, network filesystem or really anything. The program doesn't get to know what kind of sandbox it lives inside.
- Library / language capabilities. When I pull in some 3rd party library - like an npm module - that library should also be passed a capability too, either at import time or per callsite. It shouldn't have read/write access to all other bytes in my program's address space. It shouldn't have access to do anything on my computer as if it were me! The question is: "What is the blast radius of this code?" If the library you're using is malicious or vulnerable, we need to have sane defaults for how much damage can be caused. Calling lib::add(1, 2) shouldn't be able to result in a persistent compromise of my entire computer.
SeL4 has fast, efficient OS level capabilities. Its had them for years. They work great. They're fast - faster than linux in many cases. And tremendously useful. They allow for transparent sandboxing, userland drivers, IPC, security improvements, and more. You can even run linux as a process in sel4. I want an OS that has all the features of my linux desktop, but works like SeL4.
Unfortunately, I don't think any programming language has the kind of language level capabilities I want. Rust is really close. We need a way to restrict a 3rd party crate from calling any unsafe code (including from untrusted dependencies). We need to fix the long standing soundness bugs in rust. And we need a capability based standard library. No more global open() / listen() / etc. Only openat(), and equivalents for all other parts of the OS.
If LLMs keep getting better, I'm going to get an LLM to build all this stuff in a few years if nobody else does it first. Security on modern desktop operating systems is a joke.
Capabilities have a lot of serious design problems which is why no mainstream language has them. Because this comes up so often on HN I wrote an essay explaining the issues here:
But as pointed out by others, this particular exploit wouldn't be stopped by capabilities. Nor would it be stopped by micro-kernels. The filesystem is a trusted entity on any OS design I'm familiar with as it's what holds the core metadata about what components have what permissions. If you can exploit the filesystem code, you can trivially obtain any permission. That the code runs outside of the CPU's supervisor mode means nothing.
The only techniques we have to stop bugs like this are garbage collection or use of something like Rust's affine type system. You could in principle write a kernel in a language like C#, Java or Kotlin and it would be immune to these sorts of bugs.
This essay only addresses my second point - capabilities within a program. It doesn't address OS level capabilities at all.
But even in the space of programming languages, I find this essay extremely unconvincing. Like, you raise points like this:
> Here are some problems you’ll have to solve in order to sandbox libraries: What is your threat model? How do you stop components tampering with each other’s memory?
The threat model is left pad cryptolockering your computer via a supply chain attack. The solution is to design a language such that if I import leftpad, then call it, my computer can't get hacked.
You stop components tampering with each others' memory by using a memory safe language.
> its main() method must be given a “god object” exposing all the ambient authorities the app begins with
So what? The main function already takes arguments. I don't understand the problem.
Haskell already passes a type object as an argument to anything which does IO. They don't do it for security. Turns out having pure functions separated from non-pure functions is a beautiful thing.
Then there's these weird claims:
> Any mutable global variable is a problem as it may allow one component to violate expectations held by another.
You don't need to ban mutable global variables! Lets imagine we did this in safe rust. I think the only constraint is that a global variable can't be shared over the boundary between crates. But - nobody does that anyway. Even if you did share a global over a crate boundary, the child crate would still only be able to access it through methods on the type.
Sneaky developers could leverage globals to violate the security boundary. But it would be hard to do by accident. Maybe just, don't do that.
Your essay talks about some research project making a capability based java subset. And I understand that the resulting ergonomics weren't very good. But that isn't evidence that capabilities themselves are a bad idea. If a research student wrote a half baked C compiler one time, you wouldn't take that as evidence that C compilers are a bad idea. I do, however, accept that the burden of proof is on me to demonstrate that its a good idea. I hope that I can some day rise to that challenge.
> The filesystem is a trusted entity on any OS design I'm familiar with
Thats not how capability based microkernels like SeL4 work. The filesystem is owned by a specialised process. Other processes only modify files by sending messages to the filesystem process via a capability handle. If nobody created a writable file handle, the file can't be arbitrarily mutated by another module. Copyfail happened because in linux, any code can by default interact with the page table. One piece of code was missing access control checks. In capability based systems, its basically impossible to accidentally forget access control checks like that.
> The only techniques we have to stop bugs like this are garbage collection or use of something like Rust's affine type system. You could in principle write a kernel in a language like C#, Java or Kotlin and it would be immune to these sorts of bugs.
Copyfail is a logic bug. C#, Java or Kotlin wouldn't save you from it at all.
The article talks about OS capabilities in the second part when it discusses Mojo, which is based on IPC.
> The solution is to design a language such that if I import leftpad, then call it, my computer can't get hacked.
That requirement may seem clear right now, but the moment you talk to other people about your language you'll find there's no agreement on what "get hacked" means. Some people will consider calling exit(0) repeatedly to be "hacked" because it's a DoS attack, others will say no code execution or priv escalation happened, so that's not being hacked. Some will say that left-pad being able to read arbitrary bytes from your address space is being hacked, others will say no harm done and thus it wasn't being hacked. The details matter and you need to nail them down in advance.
It turns out for example that one of the top uses of the Java SecurityManager was just to stop plugins accidentally calling System.exit() and tearing down the whole process. It wasn't even a security goal, really.
> You stop components tampering with each others' memory by using a memory safe language.
That's not enough. See languages like Ruby or JavaScript, which are memory safe but not sandboxable due to all the monkeypatching they allow.
> Haskell already passes a type object as an argument to anything which does IO. They don't do it for security. Turns out having pure functions separated from non-pure functions is a beautiful thing.
But almost nobody uses Haskell, partly because of poor ergonomics like this! So if you want a language that gets wide usage and has a good library ecosystem, monads for everything probably isn't going to take off.
> If nobody created a writable file handle, the file can't be arbitrarily mutated by another module.
We're talking about critical bugs in the filesystem so what the FS processes idea of a file handle is doesn't really matter. If you can confuse or buffer overflow the FS process by sending it messages, you can then edit state inside that process you weren't supposed to be able to access, and as that process controls the security system for everything it's game over. Microkernels have no way to stop this, which is one reason very few operating systems move the core FS out into a separate process. You can't easily survive a crash of the core FS code, and it being exploited is equivalent to an exploit of the core microkernel anyway in terms of adversarial goals. So you might as well just run it in-kernel and reap the performance benefits.
> > Haskell already passes a type object as an argument to anything which does IO. They don't do it for security. Turns out having pure functions separated from non-pure functions is a beautiful thing.
> But almost nobody uses Haskell
Sad, but true
> partly because of poor ergonomics like this!
I'm somewhat dubious that's the reason, partly because I find such ergonomic excellent! Especially those provided by my capability system Bluefin: https://hackage.haskell.org/package/bluefin
> We're talking about critical bugs in the filesystem so what the FS processes idea of a file handle is doesn't really matter.
The copyfail bug wasn’t a bug in the filesystem code. It was a bug in the crypto algorithm code, which wrote to the filesystem page table without checking if the process invoking it had permission to write to the passed file handle. In a monolithic kernel like Linux, every subsystem can access the memory of every other subsystem by default. It’s up to each subsystem to be careful. As we keep discovering, “be really careful” is not a successful security strategy.
A capability based OS like SeL4 is more secure. With SeL4, you would put the crypto algorithms and filesystem in separate user space processes. These processes would only communicate by RPC, by invoking capabilities. We can imagine how the copyfail scenario would play out: A user process has a capability representing its (read only) access to some privileged file on disk. It passes that capability to the crypto algorithm process. A bug - or even complete takeover - of the crypto algorithm process still doesn’t change that the file cap is read only. The crypto algorithm process doesn’t have direct access to the memory representing that file. It only has the read only file handle. All it can do with that handle is invoke it, which will only give it read access. Even with a bug in the crypto algorithms process, the OS would stay secure.
Yes, capability OSes aren't a magic bullet. A bug in the filesystem process could still result in filesystem corruption. But better is better. OS capabilities provide defence in depth. They would have prevented copyfail.
As far as I can tell, your argument against capabilities is that they might be slow. Some implementations have poor ergonomics. They don’t magically solve every possible security bug. You also, personally, used a bad implementation of capabilities this one time years ago in Java. Is that accurate?
You must see how unconvincing I find your argument. What are you even trying to do? Convince people to not explore different ideas in computer science? When I close my eyes I see an old man yelling: “Hey you kids! What are you doing up there, trying new things? You stop that right now!”
I don't recall making a performance argument against capabilities, but I think we're conflating microkernels and capability based languages. You can have capabilities without microkernels and that's often what people mean when they talk about passing caps into main(). Context switching at the hardware level does have a performance cost, so if you want to use lots of capabilities without a special programming language then you're going to pay for that yes.
I don't think there are any good mainstream capability based programming languages. At least I've never seen one. Actually the SecurityManager is I think the best implementation that has existed. I've not yet seen a credible proposal that's better. Stuff like Mojo and SEL4 is at least deployed to production but that's not a programming language.
> What are you even trying to do? Convince people to not explore different ideas in computer science?
No. Please go read the opening of the article again, which says: "In this essay I want to show you the challenges that you’ll face if you want to walk that path. This isn’t meant to put anyone off, just to draw a map of the territory you’re about to enter and explain why it’s currently deserted."
Lots of people have proposed capabilities as some silver bullet over the years, yet real systems hardly use them. Anyone who is serious about their own ideas should want to understand why that is and that's the goal of the article. It doesn't say nobody can do better! The whole point of writing it, is the hope that someone will. But to do better you have to understand why existing systems failed. It wasn't (primarily) about performance.
> If you can confuse or buffer overflow the FS process by sending it messages, you can then edit state inside that process you weren't supposed to be able to access, and as that process controls the security system for everything it's game over.
The assumption here is that the FS is the root of trust for the kernel. (A claim I consider dubious, but what do I know about knowing things?) It's another way to say that if you don't harden your root of trust, you're SOL. Which, ok, fair enough. But that's frankly irrelevant because hardening the root of trust is table stakes. The system cannot be secured without it, regardless of the threat model.
All of the concerns about a definition of "getting hacked" falls out of ignoring the hardening of the root of trust. I don't wish to put words in your mouth, but my interpretation of the argument is essentially, "we can't have nice things because the root of trust cannot be hardened sufficiently to prevent all intrusions."
Iff the FS is the root of trust, and it is not possible to confuse the FS by sending it messages, then there is no game over. You have a root of trust that cannot be broken.
> Microkernels have no way to stop this, which is one reason very few operating systems move the core FS out into a separate process.
My reading of the history reaches a very different conclusion. First, the primary reason that very few operating systems in practice use a microkernel design is because Linus Torvalds believed it was too slow for early 90's hardware [1]. And everyone else just does whatever Linux is doing.
Second, security through surface area reduction (and more broadly, defense-in-depth) was always the point of the microkernel design [2]. Trivially, the principle of least privilege is how one arrives at a secure system. Monolithic kernels, to this very day, continue to prove that they cannot be secured in any practical manner. I can only assume we need things to get worse before kernel developers will tighten up and take security seriously.
> So you might as well just run it in-kernel and reap the performance benefits.
There's that same mentality. Apparently "speed at all costs" is the willful trading of security for performance. That position is just as flawed as trading essential liberty for temporary safety [3]. It doesn't matter how fast the thing is when the slightest bump always causes it to explode, killing everyone on board.
Ah, I'm not saying we can't have nice things or build more secure software. I think we can build more secure software! But the argument I'm responding to is one that I've seen many times over the years on HN and elsewhere, which is some form of "capability based programming languages fix everything". It's always posited as obvious and easy, as if merely saying "capability based language" is the only explanation required and somehow the entire software industry just missed the memo. Sometimes microkernels often come along for the ride, but not always.
You're completely right that the root of trust has to be secured. I argue that the core filesystem is indeed a part of the ROT, which is why e.g. Apple has put so much effort into making it immutable and fully tied to a cryptographic root hash that's checked by the secure boot process. Moving the FS out of the core kernel wouldn't change much though - if you have a bug in your FS code at runtime then you're just SOL even if everything is arranged in a Merkle tree.
The argument being made by josephg in the sibling comment is that in SEL4 or similar the page cache would be separated from the crypto code. And maybe he's right, but the better way to get the same outcome is to not have IPsec in the kernel rather than not have the core FS - as the latter is a ROT and IPsec isn't.
I disagree that the question of what "getting hacked" means is a reformulation of trust roots. A threat model isn't the same thing as a root of trust. The argument over what appears to be minor semantics is important because it scopes your goals and effort. One of the most common failure modes I've seen in security projects is not defining a threat model up front, often leading to an automatic fallback to "the threat model contains everything" followed by despondency and failure when it turns out to be impossible.
I don't think Apple or Microsoft care much about Linus' opinions tbh. Both NeXT/macOS and Windows NT started out as microkernel designs and all of them have oscillated back and forth over the years. The original concept was indeed far too slow and a lot of functionality went back to monolithic. Then over time some functionality got lifted back out e.g. the GUI subsystem on Windows. Core FS remains though in any OS as the cost/benefit ratio of moving it is so poor.
> "capability based programming languages fix everything"
There is some truth to this idea, though. Setting aside the unsafe boundary, (FFI, direct MMIO access, etc.) a capability system in a programming language would solve some kinds of these problems. Not all; it doesn't solve logic bugs when a capability is in scope.
> It's always posited as obvious and easy
I do believe it's probably pretty obviously true, by now. But not at all easy.
> Moving the FS out of the core kernel wouldn't change much though - if you have a bug in your FS code at runtime then you're just SOL even if everything is arranged in a Merkle tree.
Perhaps, but that's only because traditional file systems are global state. A capability system turns that notion on its head specifically because global state is really the problem. The combination of capabilities and user mode file access would be quite a strong isolation boundary. The bug(s) would have to be "trivially flawed" in a way that these subtle exploits are not.
> A threat model isn't the same thing as a root of trust.
Ah, I didn't say that. I said (roughly) that security relies on a strong root of trust for every thread model. I think the distinction is important. They are not the same, but the thread model can be completely ignored (because it doesn't matter) until the root of trust is secured. In other words, a weak root of trust fails all threat models.
> I don't think Apple or Microsoft care much about Linus' opinions tbh.
True. macOS and NT are (or were?) "microkernel-ish" the last time I was in those weeds. No idea how they've evolved since.
You've made some good points, as well. I see where you are coming from.
We agree that a properly sandboxable capability-capable (ugh, lol) language would indeed be a really good security upgrade. I was sad when the SecurityManager died for that reason, even though the reasoning was very understandable.
But those claims have also got to be moderated. As no such thing has ever existed, we can't truly know how well it'd work in practice. Only experience can tell us that.
Global state is one of the key issues. Joe-E simply banned it, which is far too harsh and breaks almost everything. Mobile operating systems locked down filesystem access behind permissions and capabilities quite dramatically and were much more secure, but that came with a lot of 'vigorous' debate over owner control and power for productivity/pro-grade applications. macOS has taken an incremental approach and sandboxes off parts of the FS from apps whilst retaining what looks on the surface like a classical global shared state $HOME and / directory (although it's not).
macOS, iOS, Android and Windows have all been steadily moving code out of the kernel over the years. Apple doesn't run the core FS in a userspace process but every other FS that's not as performance sensitive is now a userspace daemon, for instance. They developed their own FUSE equivalent to do this. In Windows a lot moved out in Vista. Graphics, audio, printing, a lot of drivers are out of kernel now.
Linux has lagged behind quite badly in this respect partly because a microkernel design requires close cooperation between userspace and kernel space but the Linux design philosophy is that the kernel is a self-contained artifact.
Note that capabilities would not help for those bugs we are discussing today.
Those exploits are in kernel, and the userspace is only calling the normal, allowed calls. Removing global open()/listen()/etc.. with capability-based versions would still allow one to invoke the same kernel bugs.
(Now, using microkernel like seL4 where the kernel drivers are isolated _would_ help, but (1) that's independent from what userspace does, you can have POSIX layer with seL4 and (2) that would be may more context switches, so a performance drop)
> Note that capabilities would not help for those bugs we are discussing today.
Yes they would. Copyfail uses a bug in the linux kernel to write to arbitrary page table entries. A kernel like SeL4 puts the filesystem in a separate process. The kernel doesn't have a filesystem page table entry that it can corrupt.
Even if the bug somehow got in, the exploit chain uses the page table bug to overwrite the code in su. This can be used to get root because su has suid set. In a capability based OS, there is no "su" process to exploit like this.
A lot of these bugs seem to come from linux's monolithic nature meaning (complex code A) + (complex code B) leads to a bug. Microkernels make these sort of problems much harder to exploit because each component is small and easier to audit. And there's much bigger walls up between sections. Kernel ALG support wouldn't have raw access to overwrite page table entries in the first place.
> (2) that would be may more context switches, so a performance drop
I've heard this before. Is it actually true though? The SeL4 devs claim the context switching performance in sel4 is way better than it is in linux. There are only 11 syscalls - so optimising them is easier. Invoking a capability (like a file handle) in sel4 doesn't involve any complex scheduler lookups. Your process just hands your scheduler timeslice to the process on the other end of the invoked capability (like the filesystem driver).
But SeL4 will probably have more TLB flushes. I'm not really sure how expensive they are on modern silicon.
I'd love to see some real benchmarks doing heavy IO or something in linux and sel4. I'm not really sure how it would shake out.
Yes. But its nowhere near as powerful as capabilities.
- Pledge requires the program drop privileges. Process level caps move the "allowed actions" outside of an application. And they can do that without the application even knowing. This would - for example - let you sandbox an untrusted binary.
- Pledge still leaves an entire application in the same security zone. If your process needs network and disk access, every part of the process - including 3rd party libraries - gets access to the network and disk.
- You can reproduce pledge with caps very easily. Capability libraries generally let you make a child capability. So, cap A has access to resources x, y, z. Make cap B with access to only resource x. You could use this (combined with a global "root cap" in your process) to implement pledge. You can't use pledge to make caps.
There's an interesting distinction here where one approach is to build sandboxes that limit exposure, while the other is just allowing the program to be more secure.
One approach is "Trust No Code" and the other is "Trusted code should run safely".
the first one sounds better on paper, but leads to a very complicated system. That said, I haven't worked with jails much or other forms of sandboxing. It just seems to me that to make software function you need escape hatches, and the more of those you have, well, now you're back to plugging exploits with a more complicated system.
It was interesting to me to hear that even though OpenBSD had designed their software to limit permissions even before pledge and unveil were released - upon release they found that a shocking amount of their software actually wasn't following their own rules.
> Re OpenBSD: I think it just shows we’re all human(fallible) at the end of the day :)
Yeah. Its yet another reminder that "being really careful" isn't an adequate security policy. Attackers only need to find 1 bug. Defenders need to protect everything. In large systems, you need defence in depth. Pledge? Yeah. NX? Yeah. Process isolation between subsystems? Yeah lets have that too. Static verification? Love it. Rust's borrow checker? Sure. We need it all.
We should have:
- OS level capabilities. Launched programs get passed a capability token from the shell (or wherever you launched the program from). All syscalls take a capability as the first argument. So, "open path /foo" becomes open(cap, "/foo"). The capability could correspond to a fake filesystem, real branch of your filesystem, network filesystem or really anything. The program doesn't get to know what kind of sandbox it lives inside.
- Library / language capabilities. When I pull in some 3rd party library - like an npm module - that library should also be passed a capability too, either at import time or per callsite. It shouldn't have read/write access to all other bytes in my program's address space. It shouldn't have access to do anything on my computer as if it were me! The question is: "What is the blast radius of this code?" If the library you're using is malicious or vulnerable, we need to have sane defaults for how much damage can be caused. Calling lib::add(1, 2) shouldn't be able to result in a persistent compromise of my entire computer.
SeL4 has fast, efficient OS level capabilities. Its had them for years. They work great. They're fast - faster than linux in many cases. And tremendously useful. They allow for transparent sandboxing, userland drivers, IPC, security improvements, and more. You can even run linux as a process in sel4. I want an OS that has all the features of my linux desktop, but works like SeL4.
Unfortunately, I don't think any programming language has the kind of language level capabilities I want. Rust is really close. We need a way to restrict a 3rd party crate from calling any unsafe code (including from untrusted dependencies). We need to fix the long standing soundness bugs in rust. And we need a capability based standard library. No more global open() / listen() / etc. Only openat(), and equivalents for all other parts of the OS.
If LLMs keep getting better, I'm going to get an LLM to build all this stuff in a few years if nobody else does it first. Security on modern desktop operating systems is a joke.