On Tue, Jul 4, 2023 at 8:44 AM Christian Brauner <brauner@xxxxxxxxxx> wrote: > On Wed, Jun 28, 2023 at 10:18:19PM -0700, Andrii Nakryiko wrote: > > Add new kind of BPF kernel object, BPF token. BPF token is meant to to > > allow delegating privileged BPF functionality, like loading a BPF > > program or creating a BPF map, from privileged process to a *trusted* > > unprivileged process, all while have a good amount of control over which > > privileged operations could be performed using provided BPF token. > > > > This patch adds new BPF_TOKEN_CREATE command to bpf() syscall, which > > allows to create a new BPF token object along with a set of allowed > > commands that such BPF token allows to unprivileged applications. > > Currently only BPF_TOKEN_CREATE command itself can be > > delegated, but other patches gradually add ability to delegate > > BPF_MAP_CREATE, BPF_BTF_LOAD, and BPF_PROG_LOAD commands. > > > > The above means that new BPF tokens can be created using existing BPF > > token, if original privileged creator allowed BPF_TOKEN_CREATE command. > > New derived BPF token cannot be more powerful than the original BPF > > token. > > > > Importantly, BPF token is automatically pinned at the specified location > > inside an instance of BPF FS and cannot be repinned using BPF_OBJ_PIN > > command, unlike BPF prog/map/btf/link. This provides more control over > > unintended sharing of BPF tokens through pinning it in another BPF FS > > instances. > > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > --- > > The main issue I have with the token approach is that it is a completely > separate delegation vector on top of user namespaces. We mentioned this > duringthe conf and this was brought up on the thread here again as well. > Imho, that's a problem both security-wise and complexity-wise. > > It's not great if each subsystem gets its own custom delegation > mechanism. This imposes such a taxing complexity on both kernel- and > userspace that it will quickly become a huge liability. So I would > really strongly encourage you to explore another direction. > > I do think the spirit of your proposal is workable and that it can > mostly be kept in tact. > > As mentioned before, bpffs has all the means to be taught delegation: > > // In container's user namespace > fd_fs = fsopen("bpffs"); > > // Delegating task in host userns (systemd-bpfd whatever you want) > ret = fsconfig(fd_fs, FSCONFIG_SET_FLAG, "delegate", ...); > > // In container's user namespace > fd_mnt = fsmount(fd_fs, 0); > > ret = move_mount(fd_fs, "", -EBADF, "/my/fav/location", MOVE_MOUNT_F_EMPTY_PATH) > > Roughly, this would mean: > > (i) raise FS_USERNS_MOUNT on bpffs but guard it behind the "delegate" > mount option. IOW, it's only possibly to mount bpffs as an > unprivileged user if a delegating process like systemd-bpfd with > system-level privileges has marked it as delegatable. > (ii) add fine-grained delegation options that you want this > bpffs instance to allow via new mount options. Idk, > > // allow usage of foo > fsconfig(fd_fs, FSCONFIG_SET_STRING, "abilities", "foo"); > > // also allow usage of bar > fsconfig(fd_fs, FSCONFIG_SET_STRING, "abilities", "bar"); > > // reset allowed options > fsconfig(fd_fs, FSCONFIG_SET_STRING, ""); > > // allow usage of schmoo > fsconfig(fd_fs, FSCONFIG_SET_STRING, "abilities", "schmoo"); > > This all seems more intuitive and integrates with user and mount > namespaces of the container. This can also work for restricting > non-userns bpf instances fwiw. You can also share instances via > bind-mount and so on. The userns of the bpffs instance can also be used > for permission checking provided a given functionality has been > delegated by e.g., systemd-bpfd or whatever. I have no arguments against any of the above, and would prefer to see something like this over a token-based mechanism. However we do want to make sure we have the proper LSM control points for either approach so that admins who rely on LSM-based security policies can manage delegation via their policies. Using the fsconfig() approach described by Christian above, I believe we should have the necessary hooks already in security_fs_context_parse_param() and security_sb_mnt_opts() but I'm basing that on a quick look this morning, some additional checking would need to be done. -- paul-moore.com