Re: [PATCH RESEND v3 bpf-next 01/14] bpf: introduce BPF token object

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 7 Jul 2023 10:58:04 -0700

On Fri, Jul 7, 2023 at 6:04 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes:
>
> > On Thu, Jul 6, 2023 at 4:32 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> >>
> >> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes:
> >>
> >> > Having it as a separate single-purpose FS seems cleaner, because we
> >> > have use cases where we'd have one BPF FS instance created for a
> >> > container by our container manager, and then exposing a few separate
> >> > tokens with different sets of allowed functionality. E.g., one for
> >> > main intended workload, another for some BPF-based observability
> >> > tools, maybe yet another for more heavy-weight tools like bpftrace for
> >> > extra debugging. In the debugging case our container infrastructure
> >> > will be "evacuating" any other workloads on the same host to avoid
> >> > unnecessary consequences. The point is to not disturb
> >> > workload-under-human-debugging as much as possible, so we'd like to
> >> > keep userns intact, which is why mounting extra (more permissive) BPF
> >> > token inside already running containers is an important consideration.
> >>
> >> This example (as well as Yafang's in the sibling subthread) makes it
> >> even more apparent to me that it would be better with a model where the
> >> userspace policy daemon can just make decisions on each call directly,
> >> instead of mucking about with different tokens with different embedded
> >> permissions. Why not go that route (see my other reply for details on
> >> what I mean)?
> >
> > I don't know how you arrived at this conclusion,
>
> Because it makes it apparent that you're basically building a policy
> engine in the kernel with this...

I disagree that this is a policy engine in the kernel. It's a building
block for delegation and enforcement. The policy itself is implemented
in user-space by a privileged process that decides when to issue BPF
tokens and of which configuration. And, optionally and if necessary,
further restricting using BPF LSM in a more fine-grained and dynamic
way.

>
> > but we've debated BPF proxying and separate service at length, there
> > is no point in going on another round here.
>
> You had some objections to explicit proxying via RPC calls; I suggested
> a way of avoiding that by keeping the kernel in the loop, which you have

I thought we settled the seccomp notify proposal?

> not responded to. If you're just going to go ahead with your solution
> over any objections you could just have stated so from the beginning and
> saved us all a lot of time :/

It would also be good to understand that yours is but one of the
opinions. If you read the thread carefully you'll see that other
people have differing opinions. And yours doesn't necessarily have to
be the deciding one.

I appreciate the feedback, but I don't appreciate the expectation that
your feedback is binding in any way.

>
> Can we at least put this thing behind a kconfig option, so we can turn
> it off in distro kernels?

Why can't distro disable this in some more dynamic way, though? With
existing LSM mechanism, sysctl, whatever? I think it would be useful
to let users have control over this and decide for themselves without
having to rebuild a custom kernel.

>
> > Per-call decisions can be achieved nicely by employing BPF LSM in a
> > restrictive manner on top of BPF token (or no token, if you are ok
> > without user namespaces).
>
> Building a deficient security delegation mechanism and saying "you can
> patch things up using an LSM" is a terrible design, though. Also, this

A bunch of people disagree with you.

> still means you have to implement all the policy checks in the kernel
> (just in BPF) which is awkward at best.

"Patch things up using an LSM", if necessary, in a restrictive manner
is what LSM folks prefer. You are also assuming that it's always
necessary, and I'm saying that in lots of practical contexts LSM won't
be even necessary.

>
> -Toke
>