On Wed, May 13, 2020 at 7:54 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Wed, May 13, 2020 at 07:30:05PM +0100, Marek Majkowski wrote: > > On Wed, May 13, 2020 at 6:53 PM Alexei Starovoitov > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > On Wed, May 13, 2020 at 11:50:42AM +0100, Marek Majkowski wrote: > > > > On Wed, May 13, 2020 at 4:19 AM Alexei Starovoitov > > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > > > CAP_BPF solves three main goals: > > > > > 1. provides isolation to user space processes that drop CAP_SYS_ADMIN and switch to CAP_BPF. > > > > > More on this below. This is the major difference vs v4 set back from Sep 2019. > > > > > 2. makes networking BPF progs more secure, since CAP_BPF + CAP_NET_ADMIN > > > > > prevents pointer leaks and arbitrary kernel memory access. > > > > > 3. enables fuzzers to exercise all of the verifier logic. Eventually finding bugs > > > > > and making BPF infra more secure. Currently fuzzers run in unpriv. > > > > > They will be able to run with CAP_BPF. > > > > > > > > > > > > > Alexei, looking at this from a user point of view, this looks fine. > > > > > > > > I'm slightly worried about REUSEPORT_EBPF. Currently without your > > > > patch, as far as I understand it: > > > > > > > > - You can load SOCKET_FILTER and SO_ATTACH_REUSEPORT_EBPF without any > > > > permissions > > > > > > correct. > > > > > > > - For loading BPF_PROG_TYPE_SK_REUSEPORT program and for SOCKARRAY map > > > > creation CAP_SYS_ADMIN is needed. But again, no permissions check for > > > > SO_ATTACH_REUSEPORT_EBPF later. > > > > > > correct. With clarification that attaching process needs to own > > > FD of prog and FD of socket. > > > > > > > If I read the patchset correctly, the former SOCKET_FILTER case > > > > remains as it is and is not affected in any way by presence or absence > > > > of CAP_BPF. > > > > > > correct. As commit log says: > > > "Existing unprivileged BPF operations are not affected." > > > > > > > The latter case is different. Presence of CAP_BPF is sufficient for > > > > map creation, but not sufficient for loading SK_REUSEPORT program. It > > > > still requires CAP_SYS_ADMIN. > > > > > > Not quite. > > > The patch will allow BPF_PROG_TYPE_SK_REUSEPORT progs to be loaded > > > with CAP_BPF + CAP_NET_ADMIN. > > > Since this type of progs is clearly networking type I figured it's > > > better to be consistent with the rest of networking types. > > > Two unpriv types SOCKET_FILTER and CGROUP_SKB is the only exception. > > > > Ok, this is the controversy. It made sense to restrict SK_REUSEPORT > > programs in the past, because programs needed CAP_NET_ADMIN to create > > SOCKARRAY anyway. > > Not quite. Currently sockarray needs CAP_SYS_ADMIN to create > which makes little sense from security pov. > CAP_BPF relaxes it CAP_BPF or CAP_SYS_ADMIN. > > > Now we change this and CAP_BPF is sufficient for > > maps - I don't see why CAP_BPF is not sufficient for SK_REUSEPORT > > programs. From a user point of view I don't get why this additional > > CAP_NET_ADMIN is needed. > > That actually bring another point. I'm not changing sock_map, > sock_hash, dev_map requirements yet. All three still require CAP_NET_ADMIN. > We can relax them to CAP_BPF _or_ CAP_NET_ADMIN in the future, > but I'd like to do that in the follow up. Agreed, we can discuss relaxation of SOCKMAP in the future. > > > > I think it's a good opportunity to relax > > > > this CAP_SYS_ADMIN requirement. I think the presence of CAP_BPF should > > > > be sufficient for loading BPF_PROG_TYPE_SK_REUSEPORT. > > > > > > > > Our specific use case is simple - we want an application program - > > > > like nginx - to control REUSEPORT programs. We will grant it CAP_BPF, > > > > but we don't want to grant it CAP_SYS_ADMIN. > > > > > > You'll be able to grant nginx CAP_BPF + CAP_NET_ADMIN to load SK_REUSEPORT > > > and unpriv child process will be able to attach just like before if > > > it has right FDs. > > > I suspect your load balancer needs CAP_NET_ADMIN already anyway due to > > > use of XDP and TC progs. > > > So granting CAP_BPF + CAP_NET_ADMIN should cover all bpf prog needs. > > > Does it address your concern? > > > > Load balancer (XDP+TC) is another layer and permissions there are not > > a problem. The specific issue is nginx (port 443) and QUIC. QUIC is > > UDP and due to the nginx design we must use REUSEPORT groups to > > balance the load across workers. This is fine and could be done with a > > simple SOCK_FILTER - we don't need to grant nginx any permissions, > > apart from CAP_NET_BIND_SERVICE. > > > > We would like to make the REUSEPORT program more complex to take > > advantage of REUSEPORT_EBPF for stickyness (restarting server without > > interfering with existing flows), we are happy to grant nginx CAP_BPF, > > but we are not happy to grant it CAP_NET_ADMIN. Requiring this CAP for > > REUSEPORT severely restricts the API usability for us. > > > > In my head REUSEPORT_EBPF is much closer to SOCKET_FILTER. I > > understand why it needed capabilities before (map creation) and I > > argue these reasons go away in CAP_BPF world. I assume that any > > service (with CAP_BPF) should be able to use reuseport to distribute > > packets within its own sockets. Let me know if I'm missing something. > > Fair enough. We can include SK_REUSEPORT prog type as part of CAP_BPF alone. > But will it truly achieve what you want? It will make the security model much more useful and sane for me and other users of stuff that depends on SK_REUSEPORT (like nginx + UDP). So yes, long-term it will help. Thanks. > You still need CAP_NET_ADMIN for sock_hash which you're using. > Are you saying it's part of the different process that has that cap_net_admin > and nginx will be fine with cap_bpf + cap_net_bind_service ? At this moment good old SOCKARRAY is sufficient. Having both SOCKARRAY and SK_REUSEPORT_EBPF depend only on CAP_BPF is a good start. Thanks for considering that. We can discuss relaxation of SOCKMAP in the future. Marek