Re: [PATCH RESEND v3 bpf-next 01/14] bpf: introduce BPF token object

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Wed, 05 Jul 2023 14:34:28 +0200

Christian Brauner <brauner@xxxxxxxxxx> writes:

> On Wed, Jul 05, 2023 at 09:20:28AM +0200, Daniel Borkmann wrote:
>> On 7/5/23 1:28 AM, Toke Høiland-Jørgensen wrote:
>> > Christian Brauner <brauner@xxxxxxxxxx> writes:
>> > > On Wed, Jun 28, 2023 at 10:18:19PM -0700, Andrii Nakryiko wrote:
>> > > > Add new kind of BPF kernel object, BPF token. BPF token is meant to to
>> > > > allow delegating privileged BPF functionality, like loading a BPF
>> > > > program or creating a BPF map, from privileged process to a *trusted*
>> > > > unprivileged process, all while have a good amount of control over which
>> > > > privileged operations could be performed using provided BPF token.
>> > > > 
>> > > > This patch adds new BPF_TOKEN_CREATE command to bpf() syscall, which
>> > > > allows to create a new BPF token object along with a set of allowed
>> > > > commands that such BPF token allows to unprivileged applications.
>> > > > Currently only BPF_TOKEN_CREATE command itself can be
>> > > > delegated, but other patches gradually add ability to delegate
>> > > > BPF_MAP_CREATE, BPF_BTF_LOAD, and BPF_PROG_LOAD commands.
>> > > > 
>> > > > The above means that new BPF tokens can be created using existing BPF
>> > > > token, if original privileged creator allowed BPF_TOKEN_CREATE command.
>> > > > New derived BPF token cannot be more powerful than the original BPF
>> > > > token.
>> > > > 
>> > > > Importantly, BPF token is automatically pinned at the specified location
>> > > > inside an instance of BPF FS and cannot be repinned using BPF_OBJ_PIN
>> > > > command, unlike BPF prog/map/btf/link. This provides more control over
>> > > > unintended sharing of BPF tokens through pinning it in another BPF FS
>> > > > instances.
>> > > > 
>> > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
>> > > > ---
>> > > 
>> > > The main issue I have with the token approach is that it is a completely
>> > > separate delegation vector on top of user namespaces. We mentioned this
>> > > duringthe conf and this was brought up on the thread here again as well.
>> > > Imho, that's a problem both security-wise and complexity-wise.
>> > > 
>> > > It's not great if each subsystem gets its own custom delegation
>> > > mechanism. This imposes such a taxing complexity on both kernel- and
>> > > userspace that it will quickly become a huge liability. So I would
>> > > really strongly encourage you to explore another direction.
>> > 
>> > I share this concern as well, but I'm not quite sure I follow your
>> > proposal here. IIUC, you're saying that instead of creating the token
>> > using a BPF_TOKEN_CREATE command, the policy daemon should create a
>> > bpffs instance and attach the token value directly to that, right? But
>> > then what? Are you proposing that the calling process inside the
>> > container open a filesystem reference (how? using fspick()?) and pass
>> > that to the bpf syscall? Or is there some way to find the right
>> > filesystem instance to extract this from at the time that the bpf()
>> > syscall is issued inside the container?
>> 
>> Given there can be multiple bpffs instances, it would have to be similar
>> as to what Andrii did in that you need to pass the fd to the bpf(2) for
>> prog/map creation in order to retrieve the opts->abilities from the super
>> block.
>
> I think it's pretty flexible what one can do here. Off the top of my
> head there could be a dedicated file like /sys/fs/bpf/delegate which
> only exists if delegation has been enabled. Thought that might be just a
> wasted inode. There could be a new ioctl() on bpffsd which has the same
> effect.
>
> Probably an ioctl() on the bpffs instance is easier to grok. You could
> even take away rights granted by a bpffs instance from such an fd via
> additional ioctl() on it.

Right, gotcha; I was missing whether there was an existing mechanism to
obtain this; an ioctl makes sense. I can see the utility in attaching
this to the file system instance instead of as a separate object that's
pinned (but see my post in the other subthread about using the "ask
userspace model instead").

-Toke