Re: [RFC bpf-next 0/2] bpf: allow unprivileged map access to some map types

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 11 May 2022 09:36:04 -0700

On Wed, May 11, 2022 at 02:19:26PM +0100, Alan Maguire wrote:
> Unprivileged BPF disabled (kernel.unprivileged_bpf_disabled >= 1)
> is the default in most cases now; when set, the BPF system call is
> blocked for users without CAP_BPF/CAP_SYS_ADMIN.  In some use cases
> prior to having unpriviliged_bpf_disabled set to >= 1 however,
> it made sense to split activities between capability-requiring
> ones - such as program load+attach - and those that might not require
> capabilities such as reading perf/ringbuf events, reading or
> updating BPF map configuration etc.  One example of this sort of
> approach is a service that loads a BPF program, and a user-space
> program that, after it has been loaded, interacts with it via
> pinned maps.
> 
> Such a split model is not possible with unprivileged BPF disabled,
> since even those activities such as map interactions, retrieving
> map information from pinned paths etc are blocked to
> unprivileged users.  However, granting CAP_BPF to such unprivileged
> users is quite a big hammer, allowing them to do pretty much
> anything with BPF.
> 
> This very rough RFC explores the idea - with
> CONFIG_BPF_UNPRIV_MAP_ACCESS=y - of allowing unprivileged processes
> to retrieve and work with a restricted set of BPF maps.  See
> patch 1 for the restrictions on BPF cmd and map type. Note that
> permission checks on maps are still enforced, so it's still
> possible to prevent unwanted interference by unprivileged users
> by pinning a map and setting permissions to prevent access.
> CONFIG_BPF_UNPRIV_MAP_ACCESS defaults to n, preserving current
> behaviour of blocking all BPF syscall commands.
> 
> Discussion on the bpf mailing list already alluded to this idea [1],
> though it's possible I misinterpreted it.

Thanks for the follow up. We had a long discussion about this during lsfmmbpf.
In short a bunch of folks wants to address this problem.
Your summary of the problem is accurate.
The patch though is overly cautious in fixing the issue.
The bpf ACL model is the same as traditional file's ACL.
The creds and ACLs are checked at open().  Then during file's write/read
additional checks might be performed. BPF has such functionality already. 
Different map_creates have capability checks while map_lookup has:
map_get_sys_perms(map, f) & FMODE_CAN_READ.
In other words it's enough to gate FD-receiving parts of bpf
with unprivileged_bpf_disabled sysctl.
The rest is handled by availability of FD and access to files in bpffs.
Additional kconfig is unnecessary. Also no need to filter
different map types. Only array/hash/ringbuf are ok for unpriv
when that sysctl is off. The rest of maps have their own cap_bpf checks
at creation time which is enough.
The patch 1 should probably be something like:
  if (!capable &&
      (cmd == BPF_MAP_CREATE || cmd == BPF_PROG_LOAD))
    return -EPERM;
For all other commands the user has to have a valid map/btf/prog FD to proceed.
There are few special commands that don't need to be in the above 'if':
. BPF_[PROG|MAP|BTF]_GET_NEXT_ID have explicit cap_sys_admin check.
. BPF_OBJ_GET is using traditional file ACLs to access.
. BPF_BTF_LOAD has its own cap_bpf check.

Of course there could be bugs in any of unpriv code paths, but they were
enabled with sysctl off for long time. When/if new bugs are found they will be fixed.
The unprivileged_bpf_disabled's default was flipped only because of HW bugs.
In all HW spectre exploits BPF_PROG_LOAD was the mandatory command.
Just disabling that command is enough. The BPF_MAP_CREATE alone
cannot be used in spectre-like attack. But map_create without prog_load
is a useless combination for unrpiv. So having sysctl affecting them
both makes sense from symmetry pov. libbpf and other loaders typically
create a map first, so with sysctl off the unpriv users will see those eperms first.
I hope it's mostly accurate summary of the discussion.