On Fri, Jan 08, 2021 at 01:02:21PM -0800, Stanislav Fomichev wrote: > Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE. > We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom > call in do_tcp_getsockopt using the on-stack data. This removes > 3% overhead for locking/unlocking the socket. > > Without this patch: > 3.38% 0.07% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt > | > --3.30%--__cgroup_bpf_run_filter_getsockopt > | > --0.81%--__kmalloc > > With the patch applied: > 0.52% 0.12% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt_kern > > Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx> > Cc: Martin KaFai Lau <kafai@xxxxxx> > Cc: Song Liu <songliubraving@xxxxxx> > Cc: Eric Dumazet <edumazet@xxxxxxxxxx> > --- > include/linux/bpf-cgroup.h | 27 +++++++++++-- > include/linux/indirect_call_wrapper.h | 6 +++ > include/net/sock.h | 2 + > include/net/tcp.h | 1 + > kernel/bpf/cgroup.c | 38 +++++++++++++++++++ > net/ipv4/tcp.c | 14 +++++++ > net/ipv4/tcp_ipv4.c | 1 + > net/ipv6/tcp_ipv6.c | 1 + > net/socket.c | 3 ++ > .../selftests/bpf/prog_tests/sockopt_sk.c | 22 +++++++++++ > .../testing/selftests/bpf/progs/sockopt_sk.c | 15 ++++++++ > 11 files changed, 126 insertions(+), 4 deletions(-) > [ ... ] > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c > index 6ec088a96302..c41bb2f34013 100644 > --- a/kernel/bpf/cgroup.c > +++ b/kernel/bpf/cgroup.c > @@ -1485,6 +1485,44 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, > sockopt_free_buf(&ctx); > return ret; > } > + > +int __cgroup_bpf_run_filter_getsockopt_kern(struct sock *sk, int level, > + int optname, void *optval, > + int *optlen, int retval) > +{ > + struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); > + struct bpf_sockopt_kern ctx = { > + .sk = sk, > + .level = level, > + .optname = optname, > + .retval = retval, > + .optlen = *optlen, > + .optval = optval, > + .optval_end = optval + *optlen, > + }; > + int ret; > + The current behavior only passes kernel optval to bpf prog when retval == 0. Can you explain a few words here about the difference and why it is fine? Just in case some other options may want to reuse the __cgroup_bpf_run_filter_getsockopt_kern() in the future. > + ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[BPF_CGROUP_GETSOCKOPT], > + &ctx, BPF_PROG_RUN); > + if (!ret) > + return -EPERM; > + > + if (ctx.optlen > *optlen) > + return -EFAULT; > + > + /* BPF programs only allowed to set retval to 0, not some > + * arbitrary value. > + */ > + if (ctx.retval != 0 && ctx.retval != retval) > + return -EFAULT; > + > + /* BPF programs can shrink the buffer, export the modifications. > + */ > + if (ctx.optlen != 0) > + *optlen = ctx.optlen; > + > + return ctx.retval; > +} > #endif > > static ssize_t sysctl_cpy_dir(const struct ctl_dir *dir, char **bufp, [ ... ] > diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c > index b25c9c45c148..6bb18b1d8578 100644 > --- a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c > +++ b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c > @@ -11,6 +11,7 @@ static int getsetsockopt(void) > char u8[4]; > __u32 u32; > char cc[16]; /* TCP_CA_NAME_MAX */ > + struct tcp_zerocopy_receive zc; I suspect it won't compile at least in my setup. However, I compile tools/testing/selftests/net/tcp_mmap.c fine though. I _guess_ it is because the net's test has included kernel/usr/include. AFAIK, bpf's tests use tools/include/uapi/. Others LGTM.