On Thu, Jun 22, 2023 at 1:23 AM Maryam Tahhan <mtahhan@xxxxxxxxxx> wrote: > > On 22/06/2023 00:48, Andrii Nakryiko wrote: > > > >>>> Giving a way to enable BPF in a container is only a small part of the overall task -- making BPF behave sensibly in that container seems like it should also be necessary. > >>> BPF is still a privileged thing. You can't just say that any > >>> unprivileged application should be able to use BPF. That's why BPF > >>> token is about trusting unpriv application in a controlled environment > >>> (production) to not do something crazy. It can be enforced further > >>> through LSM usage, but in a lot of cases, when dealing with internal > >>> production applications it's enough to have a proper application > >>> design and rely on code review process to avoid any negative effects. > >> We really shouldn’t be creating new kinds of privileged containers that do uncontained things. > >> > >> If you actually want to go this route, I think you would do much better to introduce a way for a container manager to usefully proxy BPF on behalf of the container. > > Please see Hao's reply ([0]) about his and Google's (not so rosy) > > experiences with building and using such BPF proxy. We (Meta) > > internally didn't go this route at all and strongly prefer not to. > > There are lots of downsides and complications to having a BPF proxy. > > In the end, this is just shuffling around where the decision about > > trusting a given application with BPF access is being made. BPF proxy > > adds lots of unnecessary logistical, operational, and development > > complexity, but doesn't magically make anything safer. > > > > [0] https://lore.kernel.org/bpf/CA+khW7h95RpurRL8qmKdSJQEXNYuqSWnP16o-uRZ9G0KqCfM4Q@xxxxxxxxxxxxxx/ > > > Apologies for being blunt, but the token approach to me seems to be a > work around providing the right level/classification for a pod/container > in order to say you support unprivileged containers using eBPF. I think > if your container needs to do privileged things it should have and be > classified with the right permissions (privileges) to do what it needs > to do. For one, when user namespaces are involved, there is no BPF use at all, no matter how privileged you want to mark the container. I mentioned this in the cover letter. Now, the claim is that user namespaces are indeed useful and necessary, and yet we also want such user-namespaced applications to be able to use BPF. Currently there is no solution to that. And external BPF service is not a great one, see [0], for real world users' feedback. [0] https://lore.kernel.org/bpf/CA+khW7h95RpurRL8qmKdSJQEXNYuqSWnP16o-uRZ9G0KqCfM4Q@xxxxxxxxxxxxxx/ > > The proxy BPF on behalf of the container approach works for containers > that don't need to do privileged BPF operations. BPF usage *is privileged* in all but some tiny use cases that are ok with heavily limited unprivileged BPF functionality (and even then recommendation is to disable unprivileged BPF altogether). Whether you proxy such privileged BPF usage through an external application or you are granting BPF token to such application is in the same category: someone has to decide to trust the application to perform privileged BPF operations. And the only debatable thing here is whether the application itself should do bpf() syscalls directly and be able to use the entire BPF ecosystem of libraries, tools, techniques, and approaches. Or we go and rewrite the world to use some RPC-based proxy to bpf() syscall? And to put it bluntly, the latter is not a realistic (or even good) option. > > I have to say that the `proxy BPF on behalf of the container` meets the > needs of unprivileged pods and at the same time giving CAP_BPF to the I tried to make it very clear in the cover letter, but granting CAP_BPF under user namespace means precisely nothing. CAP_BPF is only useful in the init namespace. > applications meets the needs of these PODs that need to do > privileged/bpf things without any tokens. Ultimately you are trusting > these apps in the same way as if you were granting a token. Yes, absolutely. As I mentioned very explicitly, it's the question of trusting application. Service vs token is implementation details, but the one that has huge implications in how applications are built, tested, versioned, deployed, etc. > > > >>> So privileged daemon (container manager) will be configured with the > >>> knowledge of which services/containers are allowed to use BPF, and > >>> will grant BPF token only to those that were explicitly allowlisted. > >