On Fri, Jun 07, 2019 at 09:29:13AM -0700, Stanislav Fomichev wrote: > Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and > BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks. > > BPF_CGROUP_SETSOCKOPT get a read-only view of the setsockopt arguments. > BPF_CGROUP_GETSOCKOPT can modify the supplied buffer. > Both of them reuse existing PTR_TO_PACKET{,_END} infrastructure. > > The buffer memory is pre-allocated (because I don't think there is > a precedent for working with __user memory from bpf). This might be > slow to do for each {s,g}etsockopt call, that's why I've added > __cgroup_bpf_prog_array_is_empty that exits early if there is nothing > attached to a cgroup. Note, however, that there is a race between > __cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup > program layout might have changed; this should not be a problem > because in general there is a race between multiple calls to > {s,g}etsocktop and user adding/removing bpf progs from a cgroup. > > The return code of the BPF program is handled as follows: > * 0: EPERM > * 1: success, execute kernel {s,g}etsockopt path after BPF prog exits > * 2: success, do _not_ execute kernel {s,g}etsockopt path after BPF > prog exits > > v3: > * typos in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY comments (Andrii Nakryiko) > * reverse christmas tree in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY (Andrii > Nakryiko) > * use __bpf_md_ptr instead of __u32 for optval{,_end} (Martin Lau) > * use BPF_FIELD_SIZEOF() for consistency (Martin Lau) > * new CG_SOCKOPT_ACCESS macro to wrap repeated parts > > v2: > * moved bpf_sockopt_kern fields around to remove a hole (Martin Lau) > * aligned bpf_sockopt_kern->buf to 8 bytes (Martin Lau) > * bpf_prog_array_is_empty instead of bpf_prog_array_length (Martin Lau) > * added [0,2] return code check to verifier (Martin Lau) > * dropped unused buf[64] from the stack (Martin Lau) > * use PTR_TO_SOCKET for bpf_sockopt->sk (Martin Lau) > * dropped bpf_target_off from ctx rewrites (Martin Lau) > * use return code for kernel bypass (Martin Lau & Andrii Nakryiko) > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c > index 1b65ab0df457..4fc8429af6fc 100644 > --- a/kernel/bpf/cgroup.c > +++ b/kernel/bpf/cgroup.c [ ... ] > +static const struct bpf_func_proto * > +cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) > +{ > + switch (func_id) { > + case BPF_FUNC_sk_fullsock: > + return &bpf_sk_fullsock_proto; May be my v2 comment has been missed. sk here (i.e. PTR_TO_SOCKET) must be a fullsock. bpf_sk_fullsock() will be a no-op. Hence, there is no need to expose bpf_sk_fullsock_proto. > + case BPF_FUNC_sk_storage_get: > + return &bpf_sk_storage_get_proto; > + case BPF_FUNC_sk_storage_delete: > + return &bpf_sk_storage_delete_proto; > +#ifdef CONFIG_INET > + case BPF_FUNC_tcp_sock: > + return &bpf_tcp_sock_proto; > +#endif > + default: > + return cgroup_base_func_proto(func_id, prog); > + } > +} > + > +static bool cg_sockopt_is_valid_access(int off, int size, > + enum bpf_access_type type, > + const struct bpf_prog *prog, > + struct bpf_insn_access_aux *info) > +{ > + const int size_default = sizeof(__u32); > + > + if (off < 0 || off >= sizeof(struct bpf_sockopt)) > + return false; > + > + if (off % size != 0) > + return false; > + > + if (type == BPF_WRITE) { > + switch (off) { > + case offsetof(struct bpf_sockopt, optlen): > + if (size != size_default) > + return false; > + return prog->expected_attach_type == > + BPF_CGROUP_GETSOCKOPT; > + default: > + return false; > + } > + } > + > + switch (off) { > + case offsetof(struct bpf_sockopt, sk): > + if (size != sizeof(struct bpf_sock *)) Based on my understanding in commit b7df9ada9a77 ("bpf: fix pointer offsets in context for 32 bit"), I think it should be 'size != sizeof(__u64)' Same for the optval and optval_end below. > + return false; > + info->reg_type = PTR_TO_SOCKET; > + break; > + case bpf_ctx_range(struct bpf_sockopt, optval): offsetof(struct bpf_sockopt, optval) > + if (size != sizeof(void *)) > + return false; > + info->reg_type = PTR_TO_PACKET; > + break; > + case bpf_ctx_range(struct bpf_sockopt, optval_end): offsetof(struct bpf_sockopt, optval_end) > + if (size != sizeof(void *)) > + return false; > + info->reg_type = PTR_TO_PACKET_END; > + break; > + default: > + if (size != size_default) > + return false; > + break; > + } > + return true; > +} > + [ ... ] > diff --git a/net/core/filter.c b/net/core/filter.c > index 55bfc941d17a..4652c0a005ca 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -1835,7 +1835,7 @@ BPF_CALL_1(bpf_sk_fullsock, struct sock *, sk) > return sk_fullsock(sk) ? (unsigned long)sk : (unsigned long)NULL; > } > > -static const struct bpf_func_proto bpf_sk_fullsock_proto = { > +const struct bpf_func_proto bpf_sk_fullsock_proto = { As mentioned above, this change is also not needed. Others LGTM. > .func = bpf_sk_fullsock, > .gpl_only = false, > .ret_type = RET_PTR_TO_SOCKET_OR_NULL, > @@ -5636,7 +5636,7 @@ BPF_CALL_1(bpf_tcp_sock, struct sock *, sk) > return (unsigned long)NULL; > } >