On 06/08, Martin Lau wrote: > On Fri, Jun 07, 2019 at 09:29:13AM -0700, Stanislav Fomichev wrote: > > Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and > > BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks. > > > > BPF_CGROUP_SETSOCKOPT get a read-only view of the setsockopt arguments. > > BPF_CGROUP_GETSOCKOPT can modify the supplied buffer. > > Both of them reuse existing PTR_TO_PACKET{,_END} infrastructure. > > > > The buffer memory is pre-allocated (because I don't think there is > > a precedent for working with __user memory from bpf). This might be > > slow to do for each {s,g}etsockopt call, that's why I've added > > __cgroup_bpf_prog_array_is_empty that exits early if there is nothing > > attached to a cgroup. Note, however, that there is a race between > > __cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup > > program layout might have changed; this should not be a problem > > because in general there is a race between multiple calls to > > {s,g}etsocktop and user adding/removing bpf progs from a cgroup. > > > > The return code of the BPF program is handled as follows: > > * 0: EPERM > > * 1: success, execute kernel {s,g}etsockopt path after BPF prog exits > > * 2: success, do _not_ execute kernel {s,g}etsockopt path after BPF > > prog exits > > > > v3: > > * typos in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY comments (Andrii Nakryiko) > > * reverse christmas tree in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY (Andrii > > Nakryiko) > > * use __bpf_md_ptr instead of __u32 for optval{,_end} (Martin Lau) > > * use BPF_FIELD_SIZEOF() for consistency (Martin Lau) > > * new CG_SOCKOPT_ACCESS macro to wrap repeated parts > > > > v2: > > * moved bpf_sockopt_kern fields around to remove a hole (Martin Lau) > > * aligned bpf_sockopt_kern->buf to 8 bytes (Martin Lau) > > * bpf_prog_array_is_empty instead of bpf_prog_array_length (Martin Lau) > > * added [0,2] return code check to verifier (Martin Lau) > > * dropped unused buf[64] from the stack (Martin Lau) > > * use PTR_TO_SOCKET for bpf_sockopt->sk (Martin Lau) > > * dropped bpf_target_off from ctx rewrites (Martin Lau) > > * use return code for kernel bypass (Martin Lau & Andrii Nakryiko) > > > > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c > > index 1b65ab0df457..4fc8429af6fc 100644 > > --- a/kernel/bpf/cgroup.c > > +++ b/kernel/bpf/cgroup.c > > [ ... ] > > > +static const struct bpf_func_proto * > > +cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) > > +{ > > + switch (func_id) { > > + case BPF_FUNC_sk_fullsock: > > + return &bpf_sk_fullsock_proto; > May be my v2 comment has been missed. > > sk here (i.e. PTR_TO_SOCKET) must be a fullsock. > bpf_sk_fullsock() will be a no-op. Hence, there is > no need to expose bpf_sk_fullsock_proto. I think I missed that fact that PTR_TO_SOCKET implies fullsock. Will remove, thanks! > > + case BPF_FUNC_sk_storage_get: > > + return &bpf_sk_storage_get_proto; > > + case BPF_FUNC_sk_storage_delete: > > + return &bpf_sk_storage_delete_proto; > > +#ifdef CONFIG_INET > > + case BPF_FUNC_tcp_sock: > > + return &bpf_tcp_sock_proto; > > +#endif > > + default: > > + return cgroup_base_func_proto(func_id, prog); > > + } > > +} > > + > > +static bool cg_sockopt_is_valid_access(int off, int size, > > + enum bpf_access_type type, > > + const struct bpf_prog *prog, > > + struct bpf_insn_access_aux *info) > > +{ > > + const int size_default = sizeof(__u32); > > + > > + if (off < 0 || off >= sizeof(struct bpf_sockopt)) > > + return false; > > + > > + if (off % size != 0) > > + return false; > > + > > + if (type == BPF_WRITE) { > > + switch (off) { > > + case offsetof(struct bpf_sockopt, optlen): > > + if (size != size_default) > > + return false; > > + return prog->expected_attach_type == > > + BPF_CGROUP_GETSOCKOPT; > > + default: > > + return false; > > + } > > + } > > + > > + switch (off) { > > + case offsetof(struct bpf_sockopt, sk): > > + if (size != sizeof(struct bpf_sock *)) > Based on my understanding in commit b7df9ada9a77 ("bpf: fix pointer offsets in context for 32 bit"), > I think it should be 'size != sizeof(__u64)' > > Same for the optval and optval_end below. Good point. I was actually wondering when converting BPF_W to BPF_DW in the tests whether that would work correctly on 32 bits. Thanks for commit pointer, that should, indeed, always be all sizeof(__u64). > > + return false; > > + info->reg_type = PTR_TO_SOCKET; > > + break; > > + case bpf_ctx_range(struct bpf_sockopt, optval): > offsetof(struct bpf_sockopt, optval) Ack. No narrow loads for the pointers. > > + if (size != sizeof(void *)) > > + return false; > > + info->reg_type = PTR_TO_PACKET; > > + break; > > + case bpf_ctx_range(struct bpf_sockopt, optval_end): > offsetof(struct bpf_sockopt, optval_end) > > > + if (size != sizeof(void *)) > > + return false; > > + info->reg_type = PTR_TO_PACKET_END; > > + break; > > + default: > > + if (size != size_default) > > + return false; > > + break; > > + } > > + return true; > > +} > > + > > [ ... ] > > > diff --git a/net/core/filter.c b/net/core/filter.c > > index 55bfc941d17a..4652c0a005ca 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -1835,7 +1835,7 @@ BPF_CALL_1(bpf_sk_fullsock, struct sock *, sk) > > return sk_fullsock(sk) ? (unsigned long)sk : (unsigned long)NULL; > > } > > > > -static const struct bpf_func_proto bpf_sk_fullsock_proto = { > > +const struct bpf_func_proto bpf_sk_fullsock_proto = { > As mentioned above, this change is also not needed. > > Others LGTM. Agreed, will not export. > > .func = bpf_sk_fullsock, > > .gpl_only = false, > > .ret_type = RET_PTR_TO_SOCKET_OR_NULL, > > @@ -5636,7 +5636,7 @@ BPF_CALL_1(bpf_tcp_sock, struct sock *, sk) > > return (unsigned long)NULL; > > } > >