On Mon, Jun 19, 2023 at 10:40 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > > > On Fri, Jun 9, 2023, at 12:08 PM, Andrii Nakryiko wrote: > > On Fri, Jun 9, 2023 at 11:32 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > >> > >> On Wed, Jun 7, 2023, at 4:53 PM, Andrii Nakryiko wrote: > >> > This patch set introduces new BPF object, BPF token, which allows to delegate > >> > a subset of BPF functionality from privileged system-wide daemon (e.g., > >> > systemd or any other container manager) to a *trusted* unprivileged > >> > application. Trust is the key here. This functionality is not about allowing > >> > unconditional unprivileged BPF usage. Establishing trust, though, is > >> > completely up to the discretion of respective privileged application that > >> > would create a BPF token. > >> > > >> > >> I skimmed the description and the LSFMM slides. > >> > >> Years ago, I sent out a patch set to start down the path of making the bpf() API make sense when used in less-privileged contexts (regarding access control of BPF objects and such). It went nowhere. > >> > >> Where does BPF token fit in? Does a kernel with these patches applied actually behave sensibly if you pass a BPF token into a container? > > > > Yes?.. In the sense that it is possible to create BPF programs and BPF > > maps from inside the container (with BPF token). Right now under user > > namespace it's impossible no matter what you do. > > I have no problem with creating BPF maps inside a container, but I think the maps should *be in the container*. > > My series wasn’t about unprivileged BPF per se. It was about updating the existing BPF permission model so that it made sense in a context in which it had multiple users that didn’t trust each other. I don't think it's possible with BPF, in principle, as I mentioned in the cover letter. Even if some particular types of programs could be "contained" in some sense, in general BPF is too global by its nature (it observes everything in kernel memory, it can influence system-wide behaviors, etc). > > > > >> Giving a way to enable BPF in a container is only a small part of the overall task -- making BPF behave sensibly in that container seems like it should also be necessary. > > > > BPF is still a privileged thing. You can't just say that any > > unprivileged application should be able to use BPF. That's why BPF > > token is about trusting unpriv application in a controlled environment > > (production) to not do something crazy. It can be enforced further > > through LSM usage, but in a lot of cases, when dealing with internal > > production applications it's enough to have a proper application > > design and rely on code review process to avoid any negative effects. > > We really shouldn’t be creating new kinds of privileged containers that do uncontained things. > > If you actually want to go this route, I think you would do much better to introduce a way for a container manager to usefully proxy BPF on behalf of the container. Please see Hao's reply ([0]) about his and Google's (not so rosy) experiences with building and using such BPF proxy. We (Meta) internally didn't go this route at all and strongly prefer not to. There are lots of downsides and complications to having a BPF proxy. In the end, this is just shuffling around where the decision about trusting a given application with BPF access is being made. BPF proxy adds lots of unnecessary logistical, operational, and development complexity, but doesn't magically make anything safer. [0] https://lore.kernel.org/bpf/CA+khW7h95RpurRL8qmKdSJQEXNYuqSWnP16o-uRZ9G0KqCfM4Q@xxxxxxxxxxxxxx/ > > > > > So privileged daemon (container manager) will be configured with the > > knowledge of which services/containers are allowed to use BPF, and > > will grant BPF token only to those that were explicitly allowlisted. >