On Wed, Jul 31, 2019 at 1:10 AM Song Liu <songliubraving@xxxxxx> wrote: > > > > > On Jul 30, 2019, at 1:24 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > > > On Mon, Jul 29, 2019 at 10:07 PM Song Liu <songliubraving@xxxxxx> wrote: > >> > >> Hi Andy, > >> > >>> On Jul 27, 2019, at 11:20 AM, Song Liu <songliubraving@xxxxxx> wrote: > >>> > >>> Hi Andy, > >>> > >>> > > [...] > > >>> > >> > >> I would like more comments on this. > >> > >> Currently, bpf permission is more or less "root or nothing", which we > >> would like to change. > >> > >> The short term goal is to separate bpf from root, in other words, it is > >> "all or nothing". Special user space utilities, such as systemd, would > >> benefit from this. Once this is implemented, systemd can call sys_bpf() > >> when it is not running as root. > > > > As generally nasty as Linux capabilities are, this sounds like a good > > use for CAP_BPF_ADMIN. > > I actually agree CAP_BPF_ADMIN makes sense. The hard part is to make > existing tools (setcap, getcap, etc.) and libraries aware of the new CAP. It's been done before -- it's not that hard. IMO the main tricky bit would be try be somewhat careful about defining exactly what CAP_BPF_ADMIN does. > > I don't see why you need to invent a whole new mechanism for this. > > The entire cgroup ecosystem outside bpf() does just fine using the > > write permission on files in cgroupfs to control access. Why can't > > bpf() do the same thing? > > It is easier to use write permission for BPF_PROG_ATTACH. But it is > not easy to do the same for other bpf commands: BPF_PROG_LOAD and > BPF_MAP_*. A lot of these commands don't have target concept. Maybe > we should have target concept for all these commands. But that is a > much bigger project. OTOH, "all or nothing" model allows all these > commands at once. For BPF_PROG_LOAD, I admit I've never understood why permission is required at all. I think that CAP_SYS_ADMIN or similar should be needed to get is_priv in the verifier, but I think that should mainly be useful for tracing, and that requires lots of privilege anyway. BPF_MAP_* is probably the trickiest part. One solution would be some kind of bpffs, but I'm sure other solutions are possible.