Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This may be a naive question, but I cannot help but wonder: As C and Rust aim to be link-compatible, wouldn't it be easier to gradually replace parts of the Linux kernel with Rust code?

The only technical problem I can think of is that Rust may not be available for all CPU architectures Linux supports, but this is just speculation on my part having done no research on the matter.



The Rust for Linux kernel project aims to enable writing Linux kernel device drivers in Rust.

See: https://lwn.net/Articles/862018/

The issue of Rust's LLVM based compiler toolchain not targeting all CPU archs is intended to be solved by the gcc-rs project.

See: https://lwn.net/Articles/871283/

I think the approach the Rust for Linux project is taking is wise: Not about outright re-writes but more about focusing on those subsystems where Rust's intrinsic safety and security properties helps the most.


Last I checked, one of the big problems was the Rust panics when you run out of memory, which is unacceptable in the kernel. Is there any progress on that?


Out of the box Rust doesn't provide any way to allocate heap memory, so, you can't run out if it.

Your ordinary userspace apps use std (the standard library) which relies on the alloc crate, and that provides heap allocation which indeed panics if the allocation fails. Because it correctly guesses that your "clever" strategy to handle allocation failure actually isn't and will just triple fault anyway so it should cut to the chase.

The kernel obviously doesn't have std, and Rust for Linux implements its own alloc crate.

Linus' requirement that you can fail memory allocation just means all the calls in alloc that can actually allocate memory now return Result to indicate whether the allocation was successful. This isn't how you'd do things in userspace, but, this isn't userspace so fine.


Can you elaborate as to why "his isn't how you'd do things in userspace, but, this isn't userspace so fine" holds?

Naive me - not a kernel dev at all - would argue that returning Result<Memory, AllocationError> is always better, even for userspace because it would allow me to additionally log something or gracefully deal with this.

Even if I don't want to deal with it, I could just `.unwrap()` or `.expect('my error message')` it.

Note: I am not trying to be snarky here, I genuinely don't know and would like to.

If answering this is too complex, maybe you can point me in the right direction so I can ask the right questions to find answers myself? Thanks in any case!


> it would allow me to additionally log something

If you don't have any memory your allocations are all failing. When you assemble the log message, the allocation needed to do that fails. Bang, double fault.

Now, often people don't really mean they want allocations to be able to fail generally, they're just thinking about that code they wrote that reads an entire file into RAM. If it was a 100GB file that would be a bad idea. But the best answer is: Guard the allocation you're actually worried about, don't ladle this into the fast path everybody has to deal with on every allocation.


People say that "well if allocations fail all bets are off" but can't you pre-allocate memory for error handling?

Like sit down, figure out all the things you'll want to do on an allocation failure, and once you have determined that you slice a little chunk of memory when you start your app (and maybe _that_ fails and you can't do anything). and when you hit a failure you do your think, then tear stuff down.


It's what we used to do in the days when 4MB was a lot of memory. Batch programs would just abort but interactive programs had to have enough reserve to fail gracefully, possibly unwinding and releasing things until they could operate better.

Now that I see interactive programs taking a gigabyte and the system being ok, I guess we're in a different regime.


What if the failed allocation came from a different allocator/heap than from where the allocations for string logging came from?

In general Don't assume anything about your global process state just because one allocator fails.


Mhm, thanks.

It never occurred to me (being in non-embedded land) that returning an enum as the error or a &'static str instead of a heap structure like String, could also fail.

Seeing that Result isn't part of core, but of std, this makes sense.

Just to tickle my nerve though: theoretically speaking, with your example, it would work, right?

I couldn't allocate 100GB (because OOM or not even enough RAM to begin with) but it could be that the system can allocate the needed memory for error message just fine.

Very interesting.


Result is part of core [0]. Result data and/or errors can be stack-only data. The parent was just saying that many people that say they want to guard against out-of-memory issues aren't cognizant of just how difficult that is.

Add to that that several operating systems will lie about whether you're out of memory, so the 'error' or failure will often not be on the Result() value but come in a SIGKILL instead, it's just adding complexity.

People that are actually worried about it and no how to deal with it, will be coding with a different style and can use the alloc library where/when they need to. (at least when it gets stabilized in Rust)

[0] https://doc.rust-lang.org/core/result/


Thanks for correcting my error.

I've never checked core before, so I did when checking up for this discussion.

I somehow missed Result. Silly me didn't search on that page, but ofc I found it on std

https://doc.rust-lang.org/std/result/index.html

Also thanks for clarifying that values of Result can be stack-only!


Tialaramex answered this in their post already, and you almost answered the question yourself:

> I could just .unwrap() or .expect('my error message') it.

Panicking can allocate. Allocating can fail. Failing can panic. Panicking can allocate. Allocating can fail. You can bite yourself in the ass like a real Ourobouros.

IMO, a prerequisite to using fallible allocation APIs should be attempting to write your own allocator, handling the weird and wacky problem of initialising a data structure (for the heap) in such a way that if it fails, it fails without allocating but leaves some hint as to what went wrong.


Oh, wow, I was under the impression that the error message would be stack only, no heap involved, but as Result is part of the std library and not of core, this totally makes sense.

So for `Rust for Linux` they also need to implement a `Result-like` type that is stack only based to solve this issue, right?

If so, cool, thanks, you just made my day by tickling my learning nerves! :)


It has nothing to do with Result, whatsoever. Result does not allocate. If you used a Result that way, you could certainly try to "gracefully" handle the allocation failure, but if you think it would be easy, you would be wrong. As Tialaramex said, you are probably just going to make the problem worse because it is very difficult to ensure you do not attempt to allocate during allocation-failure-recovery. Rustc doesn't and can't really check this for you.

It actually has to do with `panic!(...)`. When you use `unwrap()`/`expect("...")`, you use the panic macro under the hood; parts of the panicking infrastructure use a boxed trait object which could contain a static string or formatted String or anything else really. The box can allocate if it is not a ZST. I believe the alloc crate's default handler tries to avoid this kind of thing, so that it can't fail to allocate AGAIN in the failure-handling routine. It will likely do a better job than you could.

This is a live issue at the moment, so to go into any more detail I'd have to read a bunch of recent Rust issues/PRs.


An addendum to tie this back to the original discussion: the reason kernel devs want these APIs more than userland is that (a) in a kernel, panicking = crashing the computer, which would be bad, and (b) they have a much bigger toolbox for handling OOM.

They can kill entire misbehaving processes. What are you going to do in your little program, clear a cache whose objects are sprinkled evenly across 150 different pages? You would need more control than you get from blindly using malloc/free/rust_alloc globally. Something like memcached would be able to use these APIs, because it uses its own allocator, and knows enough about its layout to predictably free entire pages at once.


> panicking = crashing the computer

That isn't very accurate. In Rust when programming in no_std, you can (must?) define your own panic handler:

https://doc.rust-lang.org/nomicon/panic-handler.html

Which you would define in the kernel. While I'm not going to speculate on exactly what the implementation would look like, you definitely do not need to "crash" the computer. I haven't done any kernel programming, but I'm guessing the kernel could do some things at that point with shared memory space that is already allocated to deal with this situation and try to recover in some way.

Edit: for example, I just found this in the kerla project: https://github.com/nuta/kerla/blob/88fd40823852a63bd639e602b...

That halts now, but it probably doesn't need to, or could do it conditionally based on the contents of PanicInfo.


Mm no, it's pretty accurate. For a start, notice that the Linux community has been very clear that panicking is unacceptable. The reason is that they cannot realistically do anything to recover.

> panic handler [...] Which you would define in the kernel. While I'm not going to speculate on exactly what the implementation would look like, you definitely do not need to "crash" the computer.

The panic handler loses so much of the context that crashing the computer is the only thing you can practically achieve. You can't retry an operation generically from with a panic handler, it doesn't know anything about the operation you were attempting. The OOM handler gets a Layout struct only. You could try unwinding or something, but within a syscall handler, I don't see how anything good can come from that. Unwinding in the kernel is simply a terrible idea. What else are you going to do?


I disagree that PanicInfo loses so much context. PanicInfo caries an arbitrary payload of &(dyn Any + Send).

Now there is a lot that the allocator could do. If you wanted something to be retriable, it could be interesting if the thing that failed was an async task. If so, that panic info could carry enough information to say, the failure was an OOM, here’s the task that failed, and it is marked as retriable. Yes, this would require a store of tasks somewhere in the kernel. Then based on it being an OOM, see if any memory can be reallocated before retrying, or wait to retry until it is.

This is where theoretically a new async based Rust kernel, especially a micro-kernel, could be interesting. Is stack unwinding in the kernel a bad idea? Maybe. Can it be done in Linux? Maybe not, maybe it’s too much work to track all this information, but I disagree with the conviction with which you right it off.


Thanks for the thorough explanation!


Result already is “stack” based, or sized. It also already exists in core: https://doc.rust-lang.org/core/result/index.html

The Error type currently isn’t in core, but for other reasons, that just got resolved: https://twitter.com/bascule/status/1452029363197784071?s=21


AFAIK one other thing to note is that in Linux userspace, malloc (or fork) might succeed, but accessing the memory later can fail because of memory overcommit.


Yes, this is called "fallible allocations." You add methods with the "try_" prefix that work like the existing methods, except they return a Result which fails if it's out of memory instead of panicking.

We have a light / temporary fork of the Rust stdlib allocator with fallible allocation support: https://github.com/Rust-for-Linux/linux/tree/rust/rust/alloc

See e.g. https://github.com/Rust-for-Linux/linux/commit/487d7578bd036...



>> one of the big problems was the Rust panics when you run out of memory, which is unacceptable in the kernel.

If the kernel is running out of memory, IMHO that's a bug. The kernel is ultimately responsible for memory management right? It needs to prioritize itself over everything else or the system is in trouble.


This sounds like a design choice that was made decades ago and permeated everything else in Linux. I don't think it's changeable at this point, if they wanted to.


That's what may happen with the Linux kernel over time, now that it allows parts written in Rust.

The limitation of that approach is that all internal interfaces (which includes most data structures) have to be C-compatible. You can get much of the safety benefits of Rust, but lose a lot of Rust's ergonomics.


I don't think that the Linux kernel has decided (yet) to allow rust code into the kernel proper, has it?


It hasn't. But the attitude towards using Rust for device driver development has now gotten very positive, especially post the recent kernel summit.


An LWN article about Rust in Linux that's very relevant to this discussion: https://lwn.net/Articles/829858/


Google has decided to go forward anyway.

Just like Android Linux compiles just fine with clang for the last five years or so, it now makes use of Rust.

https://source.android.com/setup/build/rust/building-rust-mo...

If upstream ever cares to support clang or Rust, that is another matter.


Unless I'm missing something, that link refers to building Android user-mode components in Rust, not kernel components in Rust.


That's correct, but it came after them rebuilding the bluetooth stack https://news.ycombinator.com/item?id=26647981


Yes, gradual replacement is easier and more practical for large projects. However, typical C APIs and idioms are different from idiomatic Rust. Rust uses more type-system features, generics, iterators, RAII, and prefers tree-like data structures and far fewer pointers (no linked lists!).

C rewritten in Rust is still very C-like, and requires refactorings that you can't do until it's all Rust.

BTW: There are two GCC-based Rust implementations in the works, so compatibility with exotic platforms is going to be solved.


A practical problem is that in order to gain the benefit of linking Rust and C code, one has to give up Rust's wonderful guarantees at the interface. So until islands of Rust meet up, these interact through a C ABI, and have to expose C-like behavior.


But you still get those guarantees and benefits in the Rust implementation itself, even if not at the interface (to non-Rust code). By a similar argument, a standalone Rust program "gives up" its guarantees whenever the process makes a call to libc or to the operating system (syscalls), but this isn't really a practical problem.


Sure, I agree. The point is that if the total corpus of Rust code is too isolated into too small islands, then the boundary effects of each island may eat up a lot of the benefit.

I still think the project is really cool and worthwhile.


The world really needs _two_ projects, a practical one that is interoperable with the here and now, to get incremental improvements in security from bits reimplemented in Rust.

And then, in the longer term, someone ought to write a book about operating system implementation in Rust that ignores Linux interoperability and focuses on readability, maintainability by showing use of Rust's idiomatic style.


It’s not naive, it’s just been speculated and talked about incessantly since Rust came out. The one naive part is underestimating how long that would take - Linux is one of the largest codebases on Earth. It would require massive coordination over years to rewrite, and in the end all you would get is the exact same product.

It’s just not an interesting or unique perspective, and pretty silly to suggest seriously.


The Linux kernel is massive, so writing any part of it in Rust is an enormous undertaking. Even a few functions at a time, we're talking decades.


Well, writing a kernel from scratch that supports as many CPU architectures and devices as Linux would take about as long, assuming one could attract a critical mass of developers.

Using an incremental approach at least gives us some benefits in the near future.

I'm not even saying a fresh start would be a bad idea. But incrementally replacing parts of Linux seems like a more promising approach, IMHO.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: