Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm confused why BPF exists in the first place. Can't we just compile kernel modules that hook into the tracing infrastructure?

It seems like a webassembly for the kernel but local software has the benefits of knowing the platform it is running on. I.e. Why compile C code to eBPF, when I can just compile to native code directly?

I can potentially see it solving a permissions problem, where you want to give unprivileged users in a multi-tenant setup the ability to run hooks in the kernel. Is that actually a common use case? I don't think it is.



Yes, you can just compile kernel modules, but you take the risk of crashing the kernel. eBPF provides a safe way to interact with the kernel due to not being turing complete and additional restrictions. Systemtap is another example of such language but compiles to kernel modules instead.

This is quite important when you want to run this code in production. You don't want to accidently crash your kernel.


I'm not sure this argument makes sense. Avoiding accidentally crashing the kernel doesn't require a BPF layer.

For instance, you could just write your kernel module in a sufficiently safe language, like Rust, and have the same benefits. You could even pre-compile eBPF for the exact same level of safety. Still no need for the bpf() system call or the eBPF VM or JIT in the kernel.


(e)BPF has the following guarantees:

* Strictly typed -- registers, and memory are type checked at compilation time. If you use something like Rust, you'd have to bring rustc into the kernel

* Guaranteed to terminate -- you cannot jump backwards, and there is an upper bound on the instruction count

* Bounded memory -- The registers, and accessible memory via maps are a fixed size. We don't have a stack per se.

Compiling Rust to this is possible, but it'd require quite a bit of infrastructure in the kernel to verify that the code is safe, versus the simplicity of eBPF. Early attempts at a general purpose in-kernel VM included passing an AST in, and then doing safety checking on the AST, but they proved too complicated to do safely.


I'm not arguing against eBPF the language. It's safety guarantees make sense to me.

I'm arguing against the in-kernel eBPF infrastructure: bpf system call, the JIT and the VM.

I think it makes more sense to just compile eBPF (or rust or whatever safe language you want) to a kernel module.


The idea with having eBPF in the kernel is that we can limit the amount of trust given to a particular user-space task.

Accepting compiled stuff in the form of a kernel module requires root privileges and requires that the kernel essentially have complete trust in the code being loaded.

Loading eBPF eliminates the need to trust the process/user doing the loading to that level.


The bpf() system call and SOCK_RAW both require root. Is there an example of using bpf that doesn't require root?


The BPF syscalls don't require cap sys admin. Only specific invocations. You can setup a socket filter without sys admin, and a device or XDP filter with net admin.


Sure but how common is that case? How common are multi-tenant Linux systems with untrusted users that give those specific permissions? Do you want untrusted users sniffing the packets of others?


I love rust but it's not a panacea. It'll prevent memory errors and type errors in a lot of cases but that's not the only way you can crash a kernel. Logic errors and giving the wrong data over the interfaces to the kernel have potential to either kill processes, lock up the kernel or cause it to corrupt data. The ebpf interfaces by design don't suffer from these problems because of their restricted nature. They purposefully say there are things you can't compute here so they don't have to solve the halting problem and various other things!


Can you give us an example of such a module?


I can but I don't see why that is necessary. It's plain to see that it's possible and performs better in production since it avoids the JIT step.

https://github.com/tsgates/rust.ko


RESF!! Why have sandboxes because rust solves every programming error !!


If a kernel module crashes, you panic the kernel (normally).

eBPF probes can't crash and are determanistically safe (they aren't actually Turing complete). So you are unlikely to heavily impact application performance.


If you write your kernel module in eBPF (by pre-compiling to native code) it can't crash either.


BPF was initially added for packet filtering, iirc. Compiling kernel modules for each filtering rules you'd add would not really work out very well.

Since then, BPF has grown to be used by more subsystems, including tracing, and allows user programs to do advanced (and fast) things. See for example https://github.com/ahupowerdns/secfilter . AFAIK, this doesn't require privileges, which loading a kernel module would.


For experimentation and testing, a kernel module for each rule doesn't seem unworkable. Just hide all the details behind a nice tool.

For production, placing all rules in a single module seems best. If you could avoid the overhead of executing BPF in production, wouldn't you?

I agree with the privilege argument but I don't think normal users can filter packets or add tracing with the current situation either.


See the github link I gave. Also, the chromium sandbox doesn't require privileges elevation and uses seccomp BPF.


Having used eBPF/kprobe for work, the main advantage over a precompiled kernel module is convenience. It's much easier to write a C file which hooks a kernel function, then reports that back up to a python script than it is to build and maintain a kernel module and have that talk to some higher level code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: