On Fri, Jan 8, 2021 at 8:27 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > On Fri, Jan 8, 2021 at 11:23 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > > > On Fri, Jan 8, 2021 at 8:08 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > > > > > On Fri, Jan 8, 2021 at 10:41 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > > > > > > > On Fri, Jan 8, 2021 at 7:26 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > > > > > > > > > On Fri, Jan 8, 2021 at 10:10 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > > > > > > > > > > > On Fri, Jan 8, 2021 at 7:03 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE. > > > > > > > We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom > > > > > > > call in do_tcp_getsockopt using the on-stack data. This removes > > > > > > > 3% overhead for locking/unlocking the socket. > > > > > > > > > > > > > > Without this patch: > > > > > > > 3.38% 0.07% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt > > > > > > > | > > > > > > > --3.30%--__cgroup_bpf_run_filter_getsockopt > > > > > > > | > > > > > > > --0.81%--__kmalloc > > > > > > > > > > > > > > With the patch applied: > > > > > > > 0.52% 0.12% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt_kern > > > > > > > > > > > > > > > > > > > > > > > > > OK but we are adding yet another indirect call. > > > > > > > > > > > > Can you add a patch on top of it adding INDIRECT_CALL_INET() avoidance ? > > > > > Sure, but do you think it will bring any benefit? > > > > > > > > Sure, avoiding an indirect call might be the same gain than the > > > > lock_sock() avoidance :) > > > > > > > > > We don't have any indirect avoidance in __sys_getsockopt for the > > > > > sock->ops->getsockopt() call. > > > > > If we add it for this new bpf_bypass_getsockopt, we might as well add > > > > > it for sock->ops->getsockopt? > > > > > > > > Well, that is orthogonal to this patch. > > > > As you may know, Google kernels do have a mitigation there already and > > > > Brian may upstream it. > > > I guess my point here was that if I send it out only for bpf_bypass_getsockopt > > > it might look a bit strange because the rest of the getsockopt still > > > suffers the indirect costs. > > > > > > Each new indirect call adds a cost. If you focus on optimizing > > TCP_ZEROCOPY_RECEIVE, > > it is counter intuitive adding an expensive indirect call. > Ok, then let me resend with a mitigation in place and a note > that the rest will be added later. > > > If Brian has plans to upstream the rest, maybe > > > it's better to upstream everything together with some numbers? > > > CC'ing him for his opinion. > > > > I am just saying your point about the other indirect call is already taken care. > > > > > > > > I'm happy to follow up in whatever form is best. I can also resend > > > with INDIRECT_CALL_INET2 if there are no objections in including > > > this version from the start. > > > > > > > INDIRECT_CALL_INET2 seems a strange name to me. > Any suggestion for a better name? I did play with the following: > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h > index cbba9c9ab073..f7342a30284c 100644 > --- a/include/linux/bpf-cgroup.h > +++ b/include/linux/bpf-cgroup.h > @@ -371,7 +371,9 @@ int bpf_percpu_cgroup_storage_update(struct > bpf_map *map, void *key, > int __ret = retval; \ > if (cgroup_bpf_enabled(BPF_CGROUP_GETSOCKOPT)) \ > if (!(sock)->sk_prot->bpf_bypass_getsockopt || \ > - !(sock)->sk_prot->bpf_bypass_getsockopt(level, optname)) \ > + > !INDIRECT_CALL_INET1((sock)->sk_prot->bpf_bypass_getsockopt, \ > + tcp_bpf_bypass_getsockopt, \ > + level, optname)) \ > __ret = __cgroup_bpf_run_filter_getsockopt( \ > sock, level, optname, optval, optlen, \ > max_optlen, retval); \ > diff --git a/include/linux/indirect_call_wrapper.h > b/include/linux/indirect_call_wrapper.h > index 54c02c84906a..9c3252f7e9bb 100644 > --- a/include/linux/indirect_call_wrapper.h > +++ b/include/linux/indirect_call_wrapper.h > @@ -54,10 +54,13 @@ > #if IS_BUILTIN(CONFIG_IPV6) > #define INDIRECT_CALL_INET(f, f2, f1, ...) \ > INDIRECT_CALL_2(f, f2, f1, __VA_ARGS__) > +#define INDIRECT_CALL_INET1(f, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__) > #elif IS_ENABLED(CONFIG_INET) > #define INDIRECT_CALL_INET(f, f2, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__) > +#define INDIRECT_CALL_INET1(f, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__) > #else > #define INDIRECT_CALL_INET(f, f2, f1, ...) f(__VA_ARGS__) > +#define INDIRECT_CALL_INET1(f, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__) > #endif > > #endif Yes, or maybe something only focusing on CONFIG_INET to make it clear. diff --git a/include/linux/indirect_call_wrapper.h b/include/linux/indirect_call_wrapper.h index 54c02c84906ab2548a93bacb46f7795a8e136d83..d082aa4bd3ecae52e5998b3ac05deffafcb45de0 100644 --- a/include/linux/indirect_call_wrapper.h +++ b/include/linux/indirect_call_wrapper.h @@ -46,6 +46,12 @@ #define INDIRECT_CALLABLE_SCOPE static #endif +#elif IS_ENABLED(CONFIG_INET) +#define INDIRECT_CALL_INET1(f, f2, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__) +#else +#define INDIRECT_CALL_INET1(f, f2, f1, ...) f(__VA_ARGS__) +#endif + /* * We can use INDIRECT_CALL_$NR for ipv6 related functions only if ipv6 is * builtin, this macro simplify dealing with indirect calls with only ipv4/ipv6