On Wed, May 13, 2020 at 07:30:05PM +0100, Marek Majkowski wrote: > On Wed, May 13, 2020 at 6:53 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > On Wed, May 13, 2020 at 11:50:42AM +0100, Marek Majkowski wrote: > > > On Wed, May 13, 2020 at 4:19 AM Alexei Starovoitov > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > CAP_BPF solves three main goals: > > > > 1. provides isolation to user space processes that drop CAP_SYS_ADMIN and switch to CAP_BPF. > > > > More on this below. This is the major difference vs v4 set back from Sep 2019. > > > > 2. makes networking BPF progs more secure, since CAP_BPF + CAP_NET_ADMIN > > > > prevents pointer leaks and arbitrary kernel memory access. > > > > 3. enables fuzzers to exercise all of the verifier logic. Eventually finding bugs > > > > and making BPF infra more secure. Currently fuzzers run in unpriv. > > > > They will be able to run with CAP_BPF. > > > > > > > > > > Alexei, looking at this from a user point of view, this looks fine. > > > > > > I'm slightly worried about REUSEPORT_EBPF. Currently without your > > > patch, as far as I understand it: > > > > > > - You can load SOCKET_FILTER and SO_ATTACH_REUSEPORT_EBPF without any > > > permissions > > > > correct. > > > > > - For loading BPF_PROG_TYPE_SK_REUSEPORT program and for SOCKARRAY map > > > creation CAP_SYS_ADMIN is needed. But again, no permissions check for > > > SO_ATTACH_REUSEPORT_EBPF later. > > > > correct. With clarification that attaching process needs to own > > FD of prog and FD of socket. > > > > > If I read the patchset correctly, the former SOCKET_FILTER case > > > remains as it is and is not affected in any way by presence or absence > > > of CAP_BPF. > > > > correct. As commit log says: > > "Existing unprivileged BPF operations are not affected." > > > > > The latter case is different. Presence of CAP_BPF is sufficient for > > > map creation, but not sufficient for loading SK_REUSEPORT program. It > > > still requires CAP_SYS_ADMIN. > > > > Not quite. > > The patch will allow BPF_PROG_TYPE_SK_REUSEPORT progs to be loaded > > with CAP_BPF + CAP_NET_ADMIN. > > Since this type of progs is clearly networking type I figured it's > > better to be consistent with the rest of networking types. > > Two unpriv types SOCKET_FILTER and CGROUP_SKB is the only exception. > > Ok, this is the controversy. It made sense to restrict SK_REUSEPORT > programs in the past, because programs needed CAP_NET_ADMIN to create > SOCKARRAY anyway. Not quite. Currently sockarray needs CAP_SYS_ADMIN to create which makes little sense from security pov. CAP_BPF relaxes it CAP_BPF or CAP_SYS_ADMIN. > Now we change this and CAP_BPF is sufficient for > maps - I don't see why CAP_BPF is not sufficient for SK_REUSEPORT > programs. From a user point of view I don't get why this additional > CAP_NET_ADMIN is needed. That actually bring another point. I'm not changing sock_map, sock_hash, dev_map requirements yet. All three still require CAP_NET_ADMIN. We can relax them to CAP_BPF _or_ CAP_NET_ADMIN in the future, but I'd like to do that in the follow up. > > > > I think it's a good opportunity to relax > > > this CAP_SYS_ADMIN requirement. I think the presence of CAP_BPF should > > > be sufficient for loading BPF_PROG_TYPE_SK_REUSEPORT. > > > > > > Our specific use case is simple - we want an application program - > > > like nginx - to control REUSEPORT programs. We will grant it CAP_BPF, > > > but we don't want to grant it CAP_SYS_ADMIN. > > > > You'll be able to grant nginx CAP_BPF + CAP_NET_ADMIN to load SK_REUSEPORT > > and unpriv child process will be able to attach just like before if > > it has right FDs. > > I suspect your load balancer needs CAP_NET_ADMIN already anyway due to > > use of XDP and TC progs. > > So granting CAP_BPF + CAP_NET_ADMIN should cover all bpf prog needs. > > Does it address your concern? > > Load balancer (XDP+TC) is another layer and permissions there are not > a problem. The specific issue is nginx (port 443) and QUIC. QUIC is > UDP and due to the nginx design we must use REUSEPORT groups to > balance the load across workers. This is fine and could be done with a > simple SOCK_FILTER - we don't need to grant nginx any permissions, > apart from CAP_NET_BIND_SERVICE. > > We would like to make the REUSEPORT program more complex to take > advantage of REUSEPORT_EBPF for stickyness (restarting server without > interfering with existing flows), we are happy to grant nginx CAP_BPF, > but we are not happy to grant it CAP_NET_ADMIN. Requiring this CAP for > REUSEPORT severely restricts the API usability for us. > > In my head REUSEPORT_EBPF is much closer to SOCKET_FILTER. I > understand why it needed capabilities before (map creation) and I > argue these reasons go away in CAP_BPF world. I assume that any > service (with CAP_BPF) should be able to use reuseport to distribute > packets within its own sockets. Let me know if I'm missing something. Fair enough. We can include SK_REUSEPORT prog type as part of CAP_BPF alone. But will it truly achieve what you want? You still need CAP_NET_ADMIN for sock_hash which you're using. Are you saying it's part of the different process that has that cap_net_admin and nginx will be fine with cap_bpf + cap_net_bind_service ?