On Wed, Jun 14, 2023 at 2:39 AM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > On Wed, Jun 14, 2023 at 02:23:02AM +0200, Djalal Harouni wrote: > > On Tue, Jun 13, 2023 at 12:27 AM Andrii Nakryiko > > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > > > On Mon, Jun 12, 2023 at 5:02 AM Djalal Harouni <tixxdz@xxxxxxxxx> wrote: > > > > > > > > On Sat, Jun 10, 2023 at 12:57 AM Andrii Nakryiko > > > > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > > > > > > > On Fri, Jun 9, 2023 at 3:30 PM Djalal Harouni <tixxdz@xxxxxxxxx> wrote: > > > > > > > > > > > > Hi Andrii, > > > > > > > > > > > > On Thu, Jun 8, 2023 at 1:54 AM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > ... > > > > > > > creating new BPF objects like BPF programs, BPF maps, etc. > > > > > > > > > > > > Is there a reason for coupling this only with the userns? > > > > > > > > > > There is no coupling. Without userns it is at least possible to grant > > > > > CAP_BPF and other capabilities from init ns. With user namespace that > > > > > becomes impossible. > > > > > > > > But these are not the same: delegate full cap vs delegate an fd mask? > > > > > > What FD mask are we talking about here? I don't recall us talking > > > about any FD masks, so this one is a bit confusing without more > > > context. > > > > Ah err, sorry yes referring to fd token (which I assumed is a mask of > > allowed operations or something like that). > > > > So I want the possibility to delegate the fd token in the init userns. > > > > > > > > > > One can argue unprivileged in init userns is the same privileged in > > > > nested userns > > > > Getting to delegate fd in init userns, then in nested ones seems logical... > > > > > > Again, sorry, I'm not following. Can you please elaborate what you mean? > > > > I mean can we use the fd token in the init user namespace too? not > > only in the nested user namespaces but in the first one? Sorry I > > didn't check the code. > > [...] > > > > > > > > > > Having the fd or "token" that gives access rights pinned in two > > > > separate bpffs mounts seems too much, it crosses namespaces (mount, > > > > userns etc), environments setup by privileged... > > > > > > See above, there is nothing namespaceable about BPF itself, and BPF > > > token as well. If some production setup benefits from pinning one BPF > > > token in multiple places, I don't see the problem with that. > > > > > > > > > > > I would just make it per bpffs mount and that's it, nothing more. If a > > > > program wants to bind mount it somewhere else then it's not a bpf > > > > problem. > > > > > > And if some application wants to pin BPF token, why would that be BPF > > > subsystem's problem as well? > > > > The credentials, capabilities, keyring, different namespaces, etc are > > all attached to the owning user namespace, if the BPF subsystem goes > > its own way and creates a token to split up CAP_BPF without following > > that model, then it's definitely a BPF subsystem problem... I don't > > recommend that. > > > > Feels it's going more of a system-wide approach opening BPF > > functionality where ultimately it clashes with the argument: delegate > > a subset of BPF functionality to a *trusted* unprivileged application. > > My reading of delegation is within a container/service hierarchy > > nothing more. > > You're making the exact arguments that Lennart, Aleksa, and I have been > making in the LSFMM presentation about this topic. It's even recorded: Alright, so (I think) I get a pretty good feel now for what the main concerns are, and why people are trying to push this to be an FS. And it's not so much that BPF token grants bpf() syscall usage to unpriv (but trusted) workloads or that BPF itself is not namespaceable. The main worry is that BPF token, once issues, could be illegally/uncontrollably passed outside of container, intentionally or not. And by having this association with mount namespace (through BPF FS) we automatically limit the sharing to only contain that has access to that BPF FS. So I agree that it makes sense to have this mount namespace association, but I also would like to keep BPF token to be a separate entity from BPF FS itself, and have the ability to have multiple different BPF tokens exposed in a single BPF FS instance. I think the latter is important. So how about this slight modification: when a BPF token is created using BPF_TOKEN_CREATE command, the user has to provide an FD for "associated" BPF FS instance (superblock). What that does is allows BPF token to be created with BPF FS and/or mount namespace association set in stone. After that BPF token can only be pinned in that BPF FS instance and cannot leave the boundaries of that mount namespace (specific details to be worked out, this is new area for me, so I'm sorry if I'm missing nuances). What this slight tweak gives us is that we can still have multiple BPF token instances within a single BPF FS. It is still pinnable/gettable through common bpf() syscall's BPF_OBJ_PIN/BPF_OBJ_GET commands. You still can have more nuances file permission and getting BPF token can be controlled further through LSM. Also we still get to use an extensible and familiar (to BPF users) bpf_attr binary approach. Basically, it is very much native to BPF subsystem, but it is mount namespace-bound like was requested by proponents of merging BPF token and BPF FS together. I assume that this BPF FS fd can be fetched using fsopen() or fspick() syscalls, is that right? WDYT? Does that sound like it would address all the above concerns? Please point to any important details I might be missing (as I mentioned, very unfamiliar territory). > > https://youtu.be/4CCRTWEZLpw?t=1546 > > So we fully agree with you here. I actually just rewatched that entire discussion. :) And after talking about BPF token at length in the halls of the conference and email discussions on this patch set, it was very useful to relisten (again) all the finer points that were made back then. Thanks for the remainder and the link.