On Fri, Jul 7, 2023 at 6:04 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: > > > On Thu, Jul 6, 2023 at 4:32 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > >> > >> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: > >> > >> > Having it as a separate single-purpose FS seems cleaner, because we > >> > have use cases where we'd have one BPF FS instance created for a > >> > container by our container manager, and then exposing a few separate > >> > tokens with different sets of allowed functionality. E.g., one for > >> > main intended workload, another for some BPF-based observability > >> > tools, maybe yet another for more heavy-weight tools like bpftrace for > >> > extra debugging. In the debugging case our container infrastructure > >> > will be "evacuating" any other workloads on the same host to avoid > >> > unnecessary consequences. The point is to not disturb > >> > workload-under-human-debugging as much as possible, so we'd like to > >> > keep userns intact, which is why mounting extra (more permissive) BPF > >> > token inside already running containers is an important consideration. > >> > >> This example (as well as Yafang's in the sibling subthread) makes it > >> even more apparent to me that it would be better with a model where the > >> userspace policy daemon can just make decisions on each call directly, > >> instead of mucking about with different tokens with different embedded > >> permissions. Why not go that route (see my other reply for details on > >> what I mean)? > > > > I don't know how you arrived at this conclusion, > > Because it makes it apparent that you're basically building a policy > engine in the kernel with this... I disagree that this is a policy engine in the kernel. It's a building block for delegation and enforcement. The policy itself is implemented in user-space by a privileged process that decides when to issue BPF tokens and of which configuration. And, optionally and if necessary, further restricting using BPF LSM in a more fine-grained and dynamic way. > > > but we've debated BPF proxying and separate service at length, there > > is no point in going on another round here. > > You had some objections to explicit proxying via RPC calls; I suggested > a way of avoiding that by keeping the kernel in the loop, which you have I thought we settled the seccomp notify proposal? > not responded to. If you're just going to go ahead with your solution > over any objections you could just have stated so from the beginning and > saved us all a lot of time :/ It would also be good to understand that yours is but one of the opinions. If you read the thread carefully you'll see that other people have differing opinions. And yours doesn't necessarily have to be the deciding one. I appreciate the feedback, but I don't appreciate the expectation that your feedback is binding in any way. > > Can we at least put this thing behind a kconfig option, so we can turn > it off in distro kernels? Why can't distro disable this in some more dynamic way, though? With existing LSM mechanism, sysctl, whatever? I think it would be useful to let users have control over this and decide for themselves without having to rebuild a custom kernel. > > > Per-call decisions can be achieved nicely by employing BPF LSM in a > > restrictive manner on top of BPF token (or no token, if you are ok > > without user namespaces). > > Building a deficient security delegation mechanism and saying "you can > patch things up using an LSM" is a terrible design, though. Also, this A bunch of people disagree with you. > still means you have to implement all the policy checks in the kernel > (just in BPF) which is awkward at best. "Patch things up using an LSM", if necessary, in a restrictive manner is what LSM folks prefer. You are also assuming that it's always necessary, and I'm saying that in lots of practical contexts LSM won't be even necessary. > > -Toke >