On Thu, Jun 22, 2023 at 9:50 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > > > On Thu, Jun 22, 2023, at 1:22 AM, Maryam Tahhan wrote: > > On 22/06/2023 00:48, Andrii Nakryiko wrote: > >> > >>>>> Giving a way to enable BPF in a container is only a small part of the overall task -- making BPF behave sensibly in that container seems like it should also be necessary. > >>>> BPF is still a privileged thing. You can't just say that any > >>>> unprivileged application should be able to use BPF. That's why BPF > >>>> token is about trusting unpriv application in a controlled environment > >>>> (production) to not do something crazy. It can be enforced further > >>>> through LSM usage, but in a lot of cases, when dealing with internal > >>>> production applications it's enough to have a proper application > >>>> design and rely on code review process to avoid any negative effects. > >>> We really shouldn’t be creating new kinds of privileged containers that do uncontained things. > >>> > >>> If you actually want to go this route, I think you would do much better to introduce a way for a container manager to usefully proxy BPF on behalf of the container. > >> Please see Hao's reply ([0]) about his and Google's (not so rosy) > >> experiences with building and using such BPF proxy. We (Meta) > >> internally didn't go this route at all and strongly prefer not to. > >> There are lots of downsides and complications to having a BPF proxy. > >> In the end, this is just shuffling around where the decision about > >> trusting a given application with BPF access is being made. BPF proxy > >> adds lots of unnecessary logistical, operational, and development > >> complexity, but doesn't magically make anything safer. > >> > >> [0] https://lore.kernel.org/bpf/CA+khW7h95RpurRL8qmKdSJQEXNYuqSWnP16o-uRZ9G0KqCfM4Q@xxxxxxxxxxxxxx/ > >> > > Apologies for being blunt, but the token approach to me seems to be a > > work around providing the right level/classification for a pod/container > > in order to say you support unprivileged containers using eBPF. I think > > if your container needs to do privileged things it should have and be > > classified with the right permissions (privileges) to do what it needs > > to do. > > Bluntness is great. > > I think that this whole level/classification thing is utterly wrong. Replace "BPF" with basically anything else, and you'll see how absurd it is. BPF is not "anything else", it's important to understand that BPF is inherently not compratmentalizable. And it's vast and generic in its capabilities. This changes everything. So your analogies are misleading. > > "the token approach to me seems like a work around providing the right level/classification for a pod/container in order to say you support unprivileged containers using files on disk" > > That's very 1990's. Maybe 1980's. Of *course* giving access to a filesystem has some inherent security exposure. So we can give containers access to *different* filesystems. Or we can use ACLs. Or MAC policy. Or whatever. We have many solutions, none of which are perfect, and we're doing okay. > > "the token approach to me seems like a work around providing the right level/classification for a pod/container in order to say you support unprivileged containers using the network" > > The network is a big deal. For some reason, it's cool these days to treat TCP as highly privileged. You can get secrets from your favorite (or least favorite) cloud provider with unauthenticated HTTP to a magic IP and port. You can bypass a whole lot of authenticating/authorizing proxies with unauthenticated HTTP (no TLS!) if you're on the right network. > > This is IMO obnoxious, but we deal with it by having network namespaces and firewalls and rather outdated port <= 1024 rules. > > "the token approach to me seems like a work around providing the right level/classification for a pod/container in order to say you support unprivileged containers using BPF" > > My response is: what's wrong with BPF? BPF has maps and programs and such, and we could easily apply 1990's style ownership and DAC rules to them. Can you apply DAC rules to which kernel events BPF program can be run on? Can you apply DAC rules to which in-kernel data structures a BPF program can look at and make sure that it doesn't access a task/socket/etc that "belongs" to some other container/user/etc? Can we limit XDP or AF_XDP BPF programs from seeing and controlling network traffic that will be eventually routed to a container that XDP program "should not" have access to? Without making everything so slow that it's useless? > I even *wrote the code*. Did you submit it upstream for review and wide discussion? Did you test it and integrate it with production workloads to prove that your solution is actually a viable real-world solution and not a toy? Writing the code doesn't mean solving the problem. > But for some reason, the BPF community wants to bury its head in the sand, pretend it's 1980, declare that BPF is too privileged to have access control, and instead just have a complicated switch to turn it on and off in different contexts. I won't speak on behalf of the entire BPF community, but I'm trying to explain that BPF cannot be reasonably sandboxed and has to be privileged due to its global nature. And I haven't yet seen any realistic counter-proposal to change that. And it's not about ownership of the BPF map or BPF program, it's way beyond that.. > > Please try harder. Well, maybe there is something in that "some reason" you mentioned above that you so quickly dismissed?