Re: [PATCH v11 bpf-next 02/17] bpf: add BPF token delegation mount options to BPF FS

Christian Brauner <brauner@xxxxxxxxxx> · Thu, 30 Nov 2023 15:18:33 +0100

On Mon, Nov 27, 2023 at 11:03:54AM -0800, Andrii Nakryiko wrote:
> Add few new mount options to BPF FS that allow to specify that a given
> BPF FS instance allows creation of BPF token (added in the next patch),
> and what sort of operations are allowed under BPF token. As such, we get
> 4 new mount options, each is a bit mask
>   - `delegate_cmds` allow to specify which bpf() syscall commands are
>     allowed with BPF token derived from this BPF FS instance;
>   - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
>     a set of allowable BPF map types that could be created with BPF token;
>   - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
>     a set of allowable BPF program types that could be loaded with BPF token;
>   - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
>     a set of allowable BPF program attach types that could be loaded with
>     BPF token; delegate_progs and delegate_attachs are meant to be used
>     together, as full BPF program type is, in general, determined
>     through both program type and program attach type.
> 
> Currently, these mount options accept the following forms of values:
>   - a special value "any", that enables all possible values of a given
>   bit set;
>   - numeric value (decimal or hexadecimal, determined by kernel
>   automatically) that specifies a bit mask value directly;
>   - all the values for a given mount option are combined, if specified
>   multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
>   delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
>   mask.
> 
> Ideally, more convenient (for humans) symbolic form derived from
> corresponding UAPI enums would be accepted (e.g., `-o
> delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
> it requires a bunch of UAPI header churn, so I postponed it until this
> feature lands upstream or at least there is a definite consensus that
> this feature is acceptable and is going to make it, just to minimize
> amount of wasted effort and not increase amount of non-essential code to
> be reviewed.
> 
> Attentive reader will notice that BPF FS is now marked as
> FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
> user namespace as long as the process has sufficient *namespaced*
> capabilities within that user namespace. But in reality we still
> restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
> init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
> to allow creating BPF FS context object (i.e., fsopen("bpf")) from
> inside unprivileged process inside non-init userns, to capture that
> userns as the owning userns. It will still be required to pass this
> context object back to privileged process to instantiate and mount it.
> 
> This manipulation is important, because capturing non-init userns as the
> owning userns of BPF FS instance (super block) allows to use that userns
> to constraint BPF token to that userns later on (see next patch). So
> creating BPF FS with delegation inside unprivileged userns will restrict
> derived BPF token objects to only "work" inside that intended userns,
> making it scoped to a intended "container". Also, setting these
> delegation options requires capable(CAP_SYS_ADMIN), so unprivileged
> process cannot set this up without involvement of a privileged process.
> 
> There is a set of selftests at the end of the patch set that simulates
> this sequence of steps and validates that everything works as intended.
> But careful review is requested to make sure there are no missed gaps in
> the implementation and testing.
> 
> This somewhat subtle set of aspects is the result of previous
> discussions ([0]) about various user namespace implications and
> interactions with BPF token functionality and is necessary to contain
> BPF token inside intended user namespace.
> 
>   [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/
> 
> Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> ---

I still think this is a little weird because this isn't really
unprivileged bpf and it isn't really safe bpf as well.

All this does is allow an administrator to punch a big fat hole into an
unprivileged container so workloads get to play with their favorite toy.

I think that having a way to have signed bpf programs in addition to
this would be much more interesting to generic workloads that don't know
who or what they can trust.

And there's a few things to remember:

* This absolutely isn't a safety mechanism.
* This absolutely isn't safe to enable generically in containers.
* This is a workaround and not a solution to unprivileged bpf.

And this is an ACK solely on the code of this patch, not the concept.
Acked-by: Christian Brauner <brauner@xxxxxxxxxx> (reluctantly)