On Wed, May 11, 2022 at 02:19:26PM +0100, Alan Maguire wrote: > Unprivileged BPF disabled (kernel.unprivileged_bpf_disabled >= 1) > is the default in most cases now; when set, the BPF system call is > blocked for users without CAP_BPF/CAP_SYS_ADMIN. In some use cases > prior to having unpriviliged_bpf_disabled set to >= 1 however, > it made sense to split activities between capability-requiring > ones - such as program load+attach - and those that might not require > capabilities such as reading perf/ringbuf events, reading or > updating BPF map configuration etc. One example of this sort of > approach is a service that loads a BPF program, and a user-space > program that, after it has been loaded, interacts with it via > pinned maps. > > Such a split model is not possible with unprivileged BPF disabled, > since even those activities such as map interactions, retrieving > map information from pinned paths etc are blocked to > unprivileged users. However, granting CAP_BPF to such unprivileged > users is quite a big hammer, allowing them to do pretty much > anything with BPF. > > This very rough RFC explores the idea - with > CONFIG_BPF_UNPRIV_MAP_ACCESS=y - of allowing unprivileged processes > to retrieve and work with a restricted set of BPF maps. See > patch 1 for the restrictions on BPF cmd and map type. Note that > permission checks on maps are still enforced, so it's still > possible to prevent unwanted interference by unprivileged users > by pinning a map and setting permissions to prevent access. > CONFIG_BPF_UNPRIV_MAP_ACCESS defaults to n, preserving current > behaviour of blocking all BPF syscall commands. > > Discussion on the bpf mailing list already alluded to this idea [1], > though it's possible I misinterpreted it. Thanks for the follow up. We had a long discussion about this during lsfmmbpf. In short a bunch of folks wants to address this problem. Your summary of the problem is accurate. The patch though is overly cautious in fixing the issue. The bpf ACL model is the same as traditional file's ACL. The creds and ACLs are checked at open(). Then during file's write/read additional checks might be performed. BPF has such functionality already. Different map_creates have capability checks while map_lookup has: map_get_sys_perms(map, f) & FMODE_CAN_READ. In other words it's enough to gate FD-receiving parts of bpf with unprivileged_bpf_disabled sysctl. The rest is handled by availability of FD and access to files in bpffs. Additional kconfig is unnecessary. Also no need to filter different map types. Only array/hash/ringbuf are ok for unpriv when that sysctl is off. The rest of maps have their own cap_bpf checks at creation time which is enough. The patch 1 should probably be something like: if (!capable && (cmd == BPF_MAP_CREATE || cmd == BPF_PROG_LOAD)) return -EPERM; For all other commands the user has to have a valid map/btf/prog FD to proceed. There are few special commands that don't need to be in the above 'if': . BPF_[PROG|MAP|BTF]_GET_NEXT_ID have explicit cap_sys_admin check. . BPF_OBJ_GET is using traditional file ACLs to access. . BPF_BTF_LOAD has its own cap_bpf check. Of course there could be bugs in any of unpriv code paths, but they were enabled with sysctl off for long time. When/if new bugs are found they will be fixed. The unprivileged_bpf_disabled's default was flipped only because of HW bugs. In all HW spectre exploits BPF_PROG_LOAD was the mandatory command. Just disabling that command is enough. The BPF_MAP_CREATE alone cannot be used in spectre-like attack. But map_create without prog_load is a useless combination for unrpiv. So having sysctl affecting them both makes sense from symmetry pov. libbpf and other loaders typically create a map first, so with sysctl off the unpriv users will see those eperms first. I hope it's mostly accurate summary of the discussion.