Christian Brauner <brauner@xxxxxxxxxx> writes: > On Wed, Jul 05, 2023 at 09:20:28AM +0200, Daniel Borkmann wrote: >> On 7/5/23 1:28 AM, Toke Høiland-Jørgensen wrote: >> > Christian Brauner <brauner@xxxxxxxxxx> writes: >> > > On Wed, Jun 28, 2023 at 10:18:19PM -0700, Andrii Nakryiko wrote: >> > > > Add new kind of BPF kernel object, BPF token. BPF token is meant to to >> > > > allow delegating privileged BPF functionality, like loading a BPF >> > > > program or creating a BPF map, from privileged process to a *trusted* >> > > > unprivileged process, all while have a good amount of control over which >> > > > privileged operations could be performed using provided BPF token. >> > > > >> > > > This patch adds new BPF_TOKEN_CREATE command to bpf() syscall, which >> > > > allows to create a new BPF token object along with a set of allowed >> > > > commands that such BPF token allows to unprivileged applications. >> > > > Currently only BPF_TOKEN_CREATE command itself can be >> > > > delegated, but other patches gradually add ability to delegate >> > > > BPF_MAP_CREATE, BPF_BTF_LOAD, and BPF_PROG_LOAD commands. >> > > > >> > > > The above means that new BPF tokens can be created using existing BPF >> > > > token, if original privileged creator allowed BPF_TOKEN_CREATE command. >> > > > New derived BPF token cannot be more powerful than the original BPF >> > > > token. >> > > > >> > > > Importantly, BPF token is automatically pinned at the specified location >> > > > inside an instance of BPF FS and cannot be repinned using BPF_OBJ_PIN >> > > > command, unlike BPF prog/map/btf/link. This provides more control over >> > > > unintended sharing of BPF tokens through pinning it in another BPF FS >> > > > instances. >> > > > >> > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> >> > > > --- >> > > >> > > The main issue I have with the token approach is that it is a completely >> > > separate delegation vector on top of user namespaces. We mentioned this >> > > duringthe conf and this was brought up on the thread here again as well. >> > > Imho, that's a problem both security-wise and complexity-wise. >> > > >> > > It's not great if each subsystem gets its own custom delegation >> > > mechanism. This imposes such a taxing complexity on both kernel- and >> > > userspace that it will quickly become a huge liability. So I would >> > > really strongly encourage you to explore another direction. >> > >> > I share this concern as well, but I'm not quite sure I follow your >> > proposal here. IIUC, you're saying that instead of creating the token >> > using a BPF_TOKEN_CREATE command, the policy daemon should create a >> > bpffs instance and attach the token value directly to that, right? But >> > then what? Are you proposing that the calling process inside the >> > container open a filesystem reference (how? using fspick()?) and pass >> > that to the bpf syscall? Or is there some way to find the right >> > filesystem instance to extract this from at the time that the bpf() >> > syscall is issued inside the container? >> >> Given there can be multiple bpffs instances, it would have to be similar >> as to what Andrii did in that you need to pass the fd to the bpf(2) for >> prog/map creation in order to retrieve the opts->abilities from the super >> block. > > I think it's pretty flexible what one can do here. Off the top of my > head there could be a dedicated file like /sys/fs/bpf/delegate which > only exists if delegation has been enabled. Thought that might be just a > wasted inode. There could be a new ioctl() on bpffsd which has the same > effect. > > Probably an ioctl() on the bpffs instance is easier to grok. You could > even take away rights granted by a bpffs instance from such an fd via > additional ioctl() on it. Right, gotcha; I was missing whether there was an existing mechanism to obtain this; an ioctl makes sense. I can see the utility in attaching this to the file system instance instead of as a separate object that's pinned (but see my post in the other subthread about using the "ask userspace model instead"). -Toke