Re: Question: How is BPF Token supposed to work?

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 19 Jul 2024 11:38:56 -0700

On Tue, Jul 16, 2024 at 6:46 AM Dave Tucker <dave@xxxxxxxxxxxxx> wrote:
>
> Hi,
>
>
> I’m attempting to implement BPF Token support in Aya using the implementation in
> libbpf and the kernel selftests as a reference. However, I’m hitting issues.
>
> I'm performing the following operations:
>
> 1. Creating a bpffs (using fsopen, fsconfig, fsmount) for my UID/GID 1000 with
>    “any” prog/map/cmd allowed:
>
>    $ mount | grep /tmp/bpffs
>    none on /tmp/bpffs type bpf (rw,relatime,uid=1000,gid=1000,delegate_cmds=any,delegate_maps=any,delegate_progs=any,delegate_attachs=any)
>
> 2. I’m creating a new userns with bwrap:
>
>    bwrap --unshare-user --unshare-ipc --unshare-pid --unshare-net \
>        --unshare-uts --unshare-cgroup --uid 0 --gid 0 \
>        --bind /var/lib/images/fedora / --dev-bind /dev /dev \
>        --bind /tmp/bpffs /sys/fs/bpf --bind ${PWD} /home/dave \
>        --cap-add CAP_BPF --proc /proc -- /bin/bash
>

I'm not familiar with all the above. But see [0] for exact sequence of
steps necessary. FS object has to be created in child userns, passed
to privileged root userns, which will actually instantiate it, and
then pass that FS FD back to child userns.

This is mandatory so that bpffs captures unprivileged userns as owning
userns (but privileged root userns with CAP_SYS_ADMIN is necessary to
actually create/finalize such bpffs).

  [0] https://patchwork.kernel.org/project/netdevbpf/patch/20240124022127.2379740-17-andrii@xxxxxxxxxx/

> 3. I’m then executing my BPF application inside the userns:
>
>    ./xdp-test
>
>
> However, what I’m observing is that my program is failing to load.
> strace confirms I’m getting -EINVAL from BPF_TOKEN_CREATE.
>
> $ strace ./xdp-test
> ...
> open("/sys/fs/bpf", O_RDONLY|O_LARGEFILE|O_DIRECTORY) = 9
> bpf(BPF_TOKEN_CREATE, {token_create={flags=0, bpffs_fd=9}}, 152) = -1 EPERM (Operation not permitted)
> …
>
> I believe I have CAP_BPF inside the userns also:
>
> $ getpcaps 2 # pid of bash
> 2: cap_bpf=eip
>
> My machine is running on Kernel 6.9.4.
>
> The only difference I can see between my code and the selftest is that the
> selftest os performing the fsopen from within the userns, which looks like it
> is deliberate in order to check you can’t set delegation options from within
> the userns.

not just to check, but it's mandatory to capture owning userns

>
> It’s quite possible there’s a bug in my implementation but before I try the
> same operations with libbpf directly I’d really appreciate a sanity check
> that I’m using BPF Token in the correct way first.
>
> TL;DR
>
> If I create a bpffs in the init user ns, then bind mount it into a userns,
> will BPF Token work?
>
>

No, it won't. Also note that we currently disallow creating BPF token
in root userns.

> Thanks in advance,
>
> -- Dave
>
>