Re: [PATCH RESEND v3 bpf-next 01/14] bpf: introduce BPF token object

Christian Brauner <brauner@xxxxxxxxxx> · Tue, 4 Jul 2023 15:34:54 +0200

On Tue, Jul 04, 2023 at 02:43:59PM +0200, Christian Brauner wrote:
> On Wed, Jun 28, 2023 at 10:18:19PM -0700, Andrii Nakryiko wrote:
> > Add new kind of BPF kernel object, BPF token. BPF token is meant to to
> > allow delegating privileged BPF functionality, like loading a BPF
> > program or creating a BPF map, from privileged process to a *trusted*
> > unprivileged process, all while have a good amount of control over which
> > privileged operations could be performed using provided BPF token.
> > 
> > This patch adds new BPF_TOKEN_CREATE command to bpf() syscall, which
> > allows to create a new BPF token object along with a set of allowed
> > commands that such BPF token allows to unprivileged applications.
> > Currently only BPF_TOKEN_CREATE command itself can be
> > delegated, but other patches gradually add ability to delegate
> > BPF_MAP_CREATE, BPF_BTF_LOAD, and BPF_PROG_LOAD commands.
> > 
> > The above means that new BPF tokens can be created using existing BPF
> > token, if original privileged creator allowed BPF_TOKEN_CREATE command.
> > New derived BPF token cannot be more powerful than the original BPF
> > token.
> > 
> > Importantly, BPF token is automatically pinned at the specified location
> > inside an instance of BPF FS and cannot be repinned using BPF_OBJ_PIN
> > command, unlike BPF prog/map/btf/link. This provides more control over
> > unintended sharing of BPF tokens through pinning it in another BPF FS
> > instances.
> > 
> > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > ---
> 
> The main issue I have with the token approach is that it is a completely
> separate delegation vector on top of user namespaces. We mentioned this
> duringthe conf and this was brought up on the thread here again as well.
> Imho, that's a problem both security-wise and complexity-wise.
> 
> It's not great if each subsystem gets its own custom delegation
> mechanism. This imposes such a taxing complexity on both kernel- and
> userspace that it will quickly become a huge liability. So I would
> really strongly encourage you to explore another direction.
> 
> I do think the spirit of your proposal is workable and that it can
> mostly be kept in tact.
> 
> As mentioned before, bpffs has all the means to be taught delegation:
> 
>         // In container's user namespace
>         fd_fs = fsopen("bpffs");
> 
>         // Delegating task in host userns (systemd-bpfd whatever you want)
>         ret = fsconfig(fd_fs, FSCONFIG_SET_FLAG, "delegate", ...);
> 
>         // In container's user namespace
>         fd_mnt = fsmount(fd_fs, 0);
> 
>         ret = move_mount(fd_fs, "", -EBADF, "/my/fav/location", MOVE_MOUNT_F_EMPTY_PATH)
> 
> Roughly, this would mean:
> 
> (i) raise FS_USERNS_MOUNT on bpffs but guard it behind the "delegate"
>     mount option. IOW, it's only possibly to mount bpffs as an
>     unprivileged user if a delegating process like systemd-bpfd with
>     system-level privileges has marked it as delegatable.
> (ii) add fine-grained delegation options that you want this
>      bpffs instance to allow via new mount options. Idk,
> 
>      // allow usage of foo
>      fsconfig(fd_fs, FSCONFIG_SET_STRING, "abilities", "foo");
> 
>      // also allow usage of bar
>      fsconfig(fd_fs, FSCONFIG_SET_STRING, "abilities", "bar");
> 
>      // reset allowed options
>      fsconfig(fd_fs, FSCONFIG_SET_STRING, "");
> 
>      // allow usage of schmoo
>      fsconfig(fd_fs, FSCONFIG_SET_STRING, "abilities", "schmoo");

This is really just one crummy way of doing this. It's ofc possible to
make this a binary struct if you wanted to; of any form:

struct bpf_delegation_opts {
	u64 a;
	u64 b;
	u64 c;
	u32 d;
	u32 e;
};

and then

struct bpf_delegation_opts opts = {
	.a = SOMETHING_SOMETHING,
	.d = SOMETHING_SOMETHING_ELSE,
};

fsconfig(fd_fs, FSCONFIG_SET_BINARY, "abilities", &opts, sizeof(opts));

you'll get:

param->size == sizeof(opts);
param->blob = memdup_user_nul();

and then you can version this by size like we do for extensible structs
and change whatever you'd like to change in the future.