On Tue, Aug 13, 2019 at 5:57 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > hmm. No. Kernel developers should not make any assumptions. > They should guide their design by real use cases instead. That includes studing > what people do now and hacks they use to workaround lack of interfaces. > Effecitvely bpf is root only. There are no unpriv users. > This root applications go out of their way to reduce privileges > while they still want to use bpf. That is the need that /dev/bpf is solving. > > > > > > Containers are not providing the level of security that is enough > > > to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy. > > > Containers are used to make production systems safer. > > > Some people call it more 'secure', but it's clearly not secure for > > > arbitrary code and that is what kernel.unprivileged_bpf_disabled allows. > > > When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program. > > > It's been a constant source of pain. The constant blinding, randomization, > > > verifier speculative analysis, all spectre v1, v2, v4 mitigations > > > are simply not worth it. It's a lot of complex kernel code without users. > > > > Seccomp really will want eBPF some day, and it should work without > > privilege. Maybe it should be a restricted subset of eBPF, and > > Spectre will always be an issue until dramatically better hardware > > shows up, but I think people will want the ability for regular > > programs to load eBPF seccomp programs. > > I'm absolutely against using eBPF in seccomp. > Precisely due to discussions like the current one. I still think I don't really agree with your overall premise. If eBPF is genuinely not usable by programs that are not fully trusted by the admin, then no kernel changes at all are needed. Programs that want to reduce their own privileges can easily fork() a privileged subprocess or run a little helper to which they delegate BPF operations. This is far more flexible than anything that will ever be in the kernel because it allows the helper to verify that the rest of the program is doing exactly what it's supposed to and restrict eBPF operations to exactly the subset that is needed. So a container manager or network manager that drops some provilege could have a little bpf-helper that manages its BPF XDP, firewalling, etc configuration. The two processes would talk over a socketpair. The interesting cases you're talking about really *do* involved unprivileged or less privileged eBPF, though. Let's see: systemd --user: systemd --user *is not privileged at all*. There's no issue of reducing privilege, since systemd --user doesn't have any privilege to begin with. But systemd supports some eBPF features, and presumably it would like to support them in the systemd --user case. This is unprivileged eBPF. Seccomp. Seccomp already uses cBPF, which is a form of BPF although it doesn't involve the bpf() syscall. There are some seccomp proposals in the works that will want some stuff from eBPF. In particular, the ability to call seccomp-specific bpf functions from a seccomp program could be very nice. Similarly, the ability to use the enhanced instruction set and maybe even *read* maps would be nice. I do think that seccomp will continue to want its programs to be stateless. So it's a bit of a chicken-and-egg situation. There aren't major unprivileged eBPF users because the kernel support isn't there. > > > > > > Hence I prefer this /dev/bpf mechanism to be as simple a possible. > > > The applications that will use it are going to be just as trusted as systemd. > > > > I still don't understand your systemd example. systemd --users is not > > trusted systemwide in any respect. The main PID 1 systemd is root. > > No matter how you dice it, granting a user systemd instance extra bpf > > access is tantamount to granting the user extra bpf access in general. > > People use systemd --user while their kernel have 'undef CONFIG_USER_NS'. I don't know what you're getting at. I'm typing this email in a browser running under a systemd --user instance, and there are no user namespaces involved. $ ps -u luto |grep systemd 1944 ? 00:00:02 systemd $ stat /proc/1944 ... Access: (0555/dr-xr-xr-x) Uid: ( 1000/ luto) Gid: ( 1000/ luto) Context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 $ gdb -p 1944 [snipped tons of output, but gdb works fine like this] systemd --user is not privileged. Giving it /dev/bpf as imagined by the current set of patches would be a gaping security hole. > > I think there should be no unprivileged bpf at all, > because over all these years we've seen zero use cases. > Hence all new features are root only. You're the maintainer. If you feel this way, then I think you should just drop the /dev/bpf idea entirely and have userspace manage all of this by itself. It will remain extremely awkward for containers and especially nested containers to use eBPF. --Andy