On Sun, Aug 4, 2019 at 5:08 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > On Sun, Aug 4, 2019 at 3:16 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > > > On Fri, Aug 2, 2019 at 12:22 AM Song Liu <songliubraving@xxxxxx> wrote: > > > > > > Hi Andy, > > > > > >> I actually agree CAP_BPF_ADMIN makes sense. The hard part is to make > > > >> existing tools (setcap, getcap, etc.) and libraries aware of the new CAP. > > > > > > > > It's been done before -- it's not that hard. IMO the main tricky bit > > > > would be try be somewhat careful about defining exactly what > > > > CAP_BPF_ADMIN does. > > > > > > Agreed. I think defining CAP_BPF_ADMIN could be a good topic for the > > > Plumbers conference. > > > > > > OTOH, I don't think we have to wait for CAP_BPF_ADMIN to allow daemons > > > like systemd to do sys_bpf() without root. > > > > I don't understand the use case here. Are you talking about systemd > > --user? As far as I know, a user is expected to be able to fully > > control their systemd --user process, so giving it unrestricted bpf > > access is very close to giving it superuser access, and this doesn't > > sound like a good idea. I think that, if systemd --user needs bpf(), > > it either needs real unprivileged bpf() or it needs a privileged > > helper (SUID or a daemon) to intermediate this access. > > > > > > > > > > > > >>> I don't see why you need to invent a whole new mechanism for this. > > > >>> The entire cgroup ecosystem outside bpf() does just fine using the > > > >>> write permission on files in cgroupfs to control access. Why can't > > > >>> bpf() do the same thing? > > > >> > > > >> It is easier to use write permission for BPF_PROG_ATTACH. But it is > > > >> not easy to do the same for other bpf commands: BPF_PROG_LOAD and > > > >> BPF_MAP_*. A lot of these commands don't have target concept. Maybe > > > >> we should have target concept for all these commands. But that is a > > > >> much bigger project. OTOH, "all or nothing" model allows all these > > > >> commands at once. > > > > > > > > For BPF_PROG_LOAD, I admit I've never understood why permission is > > > > required at all. I think that CAP_SYS_ADMIN or similar should be > > > > needed to get is_priv in the verifier, but I think that should mainly > > > > be useful for tracing, and that requires lots of privilege anyway. > > > > BPF_MAP_* is probably the trickiest part. One solution would be some > > > > kind of bpffs, but I'm sure other solutions are possible. > > > > > > Improving permission management of cgroup_bpf is another good topic to > > > discuss. However, it is also an overkill for current use case. > > > > > > > I looked at the code some more, and I don't think this is so hard > > after all. As I understand it, all of the map..by_id stuff is, to > > some extent, deprecated in favor of persistent maps. As I see it, the > > map..by_id calls should require privilege forever, although I can > > imagine ways to scope that privilege to a namespace if the maps > > themselves were to be scoped to a namespace. > > > > Instead, unprivileged tools would use the persistent map interface > > roughly like this: > > > > $ bpftool map create /sys/fs/bpf/my_dir/filename type hash key 8 value > > 8 entries 64 name mapname > > > > This would require that the caller have either CAP_DAC_OVERRIDE or > > that the caller have permission to create files in /sys/fs/bpf/my_dir > > (using the same rules as for any filesystem), and the resulting map > > would end up owned by the creating user and have mode 0600 (or maybe > > 0666, or maybe a new bpf_attr parameter) modified by umask. Then all > > the various capable() checks that are currently involved in accessing > > a persistent map would instead check FMODE_READ or FMODE_WRITE on the > > map file as appropriate. > > > > Half of this stuff already works. I just set my system up like this: > > > > $ ls -l /sys/fs/bpf > > total 0 > > drwxr-xr-x. 3 luto luto 0 Aug 4 15:10 luto > > > > $ mkdir /sys/fs/bpf/luto/test > > > > $ ls -l /sys/fs/bpf/luto > > total 0 > > drwxrwxr-x. 2 luto luto 0 Aug 4 15:10 test > > > > I bet that making the bpf() syscalls work appropriately in this > > context without privilege would only be a couple of hours of work. > > The hard work, creating bpffs and making it function, is already done > > :) > > > > P.S. The docs for bpftool create are less than fantastic. The > > complete lack of any error message at all when the syscall returns > > -EACCES is also not fantastic. > > This isn't remotely finished, but I spent a bit of time fiddling with this: > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=bpf/perms > > What do you think? (It's obviously not done. It doesn't compile, and > I haven't gotten to the permissions needed to do map operations. I > also haven't touched the capable() checks.) I updated the branch. It compiles, and basic map functionality works! # mount -t bpf bpf /sys/fs/bpf # cd /sys/fs/bpf # mkdir luto # chown luto: luto # setpriv --euid=1000 --ruid=1000 bash $ pwd /sys/fs/bpf bash-5.0$ ls -l total 0 drwxr-xr-x 2 luto luto 0 Aug 4 22:41 luto bash-5.0$ bpftool map create /sys/fs/bpf/luto/filename type hash key 8 value 8 entries 64 name mapname bash-5.0$ bpftool map dump pinned /sys/fs/bpf/luto/filename Found 0 elements # chown root: /sys/fs/bpf/luto/filename $ bpftool map dump pinned /sys/fs/bpf/luto/filename Error: bpf obj get (/sys/fs/bpf/luto): Permission denied So I think it's possible to get a respectable subset of bpf() functionality working without privilege in short order :) (FWIW, a decent fraction of this probably works even without my patches, but it's going to have nonsensical semantics and may fail for silly reasons.)