On Thu, Dec 17, 2020 at 9:24 AM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > When we attach a bpf program to cgroup/getsockopt any other getsockopt() > syscall starts incurring kzalloc/kfree cost. While, in general, it's > not an issue, sometimes it is, like in the case of TCP_ZEROCOPY_RECEIVE. > TCP_ZEROCOPY_RECEIVE (ab)uses getsockopt system call to implement > fastpath for incoming TCP, we don't want to have extra allocations in > there. > > Let add a small buffer on the stack and use it for small (majority) > {s,g}etsockopt values. I've started with 128 bytes to cover > the options we care about (TCP_ZEROCOPY_RECEIVE which is 32 bytes > currently, with some planned extension to 64 + some headroom > for the future). I don't really know the rule of thumb, but 128 bytes on stack feels too big to me. I would like to hear others' opinions on this. Can we solve the problem with some other mechanisms, e.g. a mempool? [...] > > +static void *sockopt_export_buf(struct bpf_sockopt_kern *ctx) > +{ > + void *p; > + > + if (ctx->optval != ctx->buf) > + return ctx->optval; > + > + /* We've used bpf_sockopt_kern->buf as an intermediary storage, > + * but the BPF program indicates that we need to pass this > + * data to the kernel setsockopt handler. No way to export > + * on-stack buf, have to allocate a new buffer. The caller > + * is responsible for the kfree(). > + */ > + p = kzalloc(ctx->optlen, GFP_USER); > + if (!p) > + return ERR_PTR(-ENOMEM); > + memcpy(p, ctx->optval, ctx->optlen); > + return p; > +} > + > int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, > int *optname, char __user *optval, > int *optlen, char **kernel_optval) > @@ -1389,8 +1420,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, > * use original userspace data. > */ > if (ctx.optlen != 0) { > - *optlen = ctx.optlen; > - *kernel_optval = ctx.optval; > + void *buf = sockopt_export_buf(&ctx); I found it is hard to follow the logic here (when to allocate memory, how to fail over, etc.). Do we have plan to reuse sockopt_export_buf()? If not, it is probably cleaner to put the logic in __cgroup_bpf_run_filter_setsockopt()? Thanks, Song [...]