Re: [PATCH RESEND v3 bpf-next 00/14] BPF token

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Wed, 05 Jul 2023 01:33:28 +0200

Christian Brauner <brauner@xxxxxxxxxx> writes:

> On Fri, Jun 30, 2023 at 01:15:47AM +0200, Toke Høiland-Jørgensen wrote:
>> Andrii Nakryiko <andrii@xxxxxxxxxx> writes:
>> 
>> > This patch set introduces new BPF object, BPF token, which allows to delegate
>> > a subset of BPF functionality from privileged system-wide daemon (e.g.,
>> > systemd or any other container manager) to a *trusted* unprivileged
>> > application. Trust is the key here. This functionality is not about allowing
>> > unconditional unprivileged BPF usage. Establishing trust, though, is
>> > completely up to the discretion of respective privileged application that
>> > would create a BPF token, as different production setups can and do achieve it
>> > through a combination of different means (signing, LSM, code reviews, etc),
>> > and it's undesirable and infeasible for kernel to enforce any particular way
>> > of validating trustworthiness of particular process.
>> >
>> > The main motivation for BPF token is a desire to enable containerized
>> > BPF applications to be used together with user namespaces. This is currently
>> > impossible, as CAP_BPF, required for BPF subsystem usage, cannot be namespaced
>> > or sandboxed, as a general rule. E.g., tracing BPF programs, thanks to BPF
>> > helpers like bpf_probe_read_kernel() and bpf_probe_read_user() can safely read
>> > arbitrary memory, and it's impossible to ensure that they only read memory of
>> > processes belonging to any given namespace. This means that it's impossible to
>> > have namespace-aware CAP_BPF capability, and as such another mechanism to
>> > allow safe usage of BPF functionality is necessary. BPF token and delegation
>> > of it to a trusted unprivileged applications is such mechanism. Kernel makes
>> > no assumption about what "trusted" constitutes in any particular case, and
>> > it's up to specific privileged applications and their surrounding
>> > infrastructure to decide that. What kernel provides is a set of APIs to create
>> > and tune BPF token, and pass it around to privileged BPF commands that are
>> > creating new BPF objects like BPF programs, BPF maps, etc.
>> 
>> So a colleague pointed out today that the Seccomp Notify functionality
>> would be a way to achieve your stated goal of allowing unprivileged
>> containers to (selectively) perform bpf() syscall operations. Christian
>> Brauner has a pretty nice writeup of the functionality here:
>> https://people.kernel.org/brauner/the-seccomp-notifier-new-frontiers-in-unprivileged-container-development
>
> I'm amazed you read this. :)

I found it quite an enjoyable read, actually :)

> The seccomp notifier comes with a lot of caveats. I think it would be
> impractical if not infeasible to handle bpf() delegation.

Right, thank you for chiming in and explaining the context. I replied
elsewhere in the thread on the content, so let's not fork the discussion
any more than we have to...

-Toke