> On Jul 30, 2019, at 1:24 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > On Mon, Jul 29, 2019 at 10:07 PM Song Liu <songliubraving@xxxxxx> wrote: >> >> Hi Andy, >> >>> On Jul 27, 2019, at 11:20 AM, Song Liu <songliubraving@xxxxxx> wrote: >>> >>> Hi Andy, >>> >>> [...] >>> >> >> I would like more comments on this. >> >> Currently, bpf permission is more or less "root or nothing", which we >> would like to change. >> >> The short term goal is to separate bpf from root, in other words, it is >> "all or nothing". Special user space utilities, such as systemd, would >> benefit from this. Once this is implemented, systemd can call sys_bpf() >> when it is not running as root. > > As generally nasty as Linux capabilities are, this sounds like a good > use for CAP_BPF_ADMIN. I actually agree CAP_BPF_ADMIN makes sense. The hard part is to make existing tools (setcap, getcap, etc.) and libraries aware of the new CAP. > > But what do you have in mind? Isn't non-root systemd mostly just the > user systemd session? That should *not* have bpf() privileges until > bpf() is improved such that you can't use it to compromise the system. cgroup bpf is the major use case here. A less important use case is to run bpf selftests without being root. > >> >> In longer term, it may be useful to provide finer grain permission of >> sys_bpf(). For example, sys_bpf() should be aware of containers; and >> user may only have access to certain bpf maps. Let's call this >> "fine grain" capability. >> >> >> Since we are seeing new use cases every year, we will need many >> iterations to implement the fine grain permission. I think we need an >> API that is flexible enough to cover different types of permission >> control. >> >> For example, bpf_with_cap() can be flexible: >> >> bpf_with_cap(cmd, attr, size, perm_fd); >> >> We can get different types of permission via different combinations of >> arguments: >> >> A perm_fd to /dev/bpf gives access to all sys_bpf() commands, so >> this is "all or nothing" permission. >> >> A perm_fd to /sys/fs/cgroup/.../bpf.xxx would only allow some >> commands to this specific cgroup. >> > > I don't see why you need to invent a whole new mechanism for this. > The entire cgroup ecosystem outside bpf() does just fine using the > write permission on files in cgroupfs to control access. Why can't > bpf() do the same thing? It is easier to use write permission for BPF_PROG_ATTACH. But it is not easy to do the same for other bpf commands: BPF_PROG_LOAD and BPF_MAP_*. A lot of these commands don't have target concept. Maybe we should have target concept for all these commands. But that is a much bigger project. OTOH, "all or nothing" model allows all these commands at once. Well, that being said, I will look more into using write permission in cgroupfs. Thanks again for all these comments and suggestions. Please let us know your future thoughts and insights. Song