On Fri, Feb 25, 2022 at 03:43:37PM -0800, Hao Luo wrote: > After we introduced sleepable tracing programs, we now have an > interesting problem. There are now three execution paths that can > reach bpf_sys_bpf: > > 1. called from bpf syscall. > 2. called from kernel context (e.g. kernel modules). > 3. called from bpf programs. > > Ideally, capability check in bpf_sys_bpf is necessary for the first two > scenarios. But it may not be necessary for the third case. Well, it's unnecessary for the first two as well. When called from the kernel lskel it's a pointless check. The kernel module can do anything regardless. When called from bpf syscall program it's not quite correct either. When CAP_BPF was introduced we've designed it to enforce permissions at prog load time. The prog_run doesn't check permissions. So syscall progs don't need this secondary permission check. Please add "case BPF_PROG_TYPE_SYSCALL:" to is_perfmon_prog_type() and combine it with this patch. That would be the best. The alternative below is less appealing. > An alternative of lifting this permission check would be introducing an > 'unpriv' version of bpf_sys_bpf, which doesn't check the current task's > capability. If the owner of the tracing prog wants it to be exclusively > used by root users, they can use the 'priv' version of bpf_sys_bpf; if > the owner wants it to be usable for non-root users, they can use the > 'unpriv' version. ... > - if (sysctl_unprivileged_bpf_disabled && !bpf_capable()) > + if (sysctl_unprivileged_bpf_disabled && !bpf_capable() && !uattr.is_kernel) This is great idea. If I could think of this I would went with it when prog_syscall was introduced.