Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: > On Thu, Jul 6, 2023 at 4:32 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >> >> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: >> >> > Having it as a separate single-purpose FS seems cleaner, because we >> > have use cases where we'd have one BPF FS instance created for a >> > container by our container manager, and then exposing a few separate >> > tokens with different sets of allowed functionality. E.g., one for >> > main intended workload, another for some BPF-based observability >> > tools, maybe yet another for more heavy-weight tools like bpftrace for >> > extra debugging. In the debugging case our container infrastructure >> > will be "evacuating" any other workloads on the same host to avoid >> > unnecessary consequences. The point is to not disturb >> > workload-under-human-debugging as much as possible, so we'd like to >> > keep userns intact, which is why mounting extra (more permissive) BPF >> > token inside already running containers is an important consideration. >> >> This example (as well as Yafang's in the sibling subthread) makes it >> even more apparent to me that it would be better with a model where the >> userspace policy daemon can just make decisions on each call directly, >> instead of mucking about with different tokens with different embedded >> permissions. Why not go that route (see my other reply for details on >> what I mean)? > > I don't know how you arrived at this conclusion, Because it makes it apparent that you're basically building a policy engine in the kernel with this... > but we've debated BPF proxying and separate service at length, there > is no point in going on another round here. You had some objections to explicit proxying via RPC calls; I suggested a way of avoiding that by keeping the kernel in the loop, which you have not responded to. If you're just going to go ahead with your solution over any objections you could just have stated so from the beginning and saved us all a lot of time :/ Can we at least put this thing behind a kconfig option, so we can turn it off in distro kernels? > Per-call decisions can be achieved nicely by employing BPF LSM in a > restrictive manner on top of BPF token (or no token, if you are ok > without user namespaces). Building a deficient security delegation mechanism and saying "you can patch things up using an LSM" is a terrible design, though. Also, this still means you have to implement all the policy checks in the kernel (just in BPF) which is awkward at best. -Toke