On Thu, Dec 16, 2021 at 01:21:26PM +0000, Pavel Begunkov wrote: > On 12/15/21 22:07, Stanislav Fomichev wrote: > > On Wed, Dec 15, 2021 at 11:55 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: > > > > > > On 12/15/21 19:15, Stanislav Fomichev wrote: > > > > On Wed, Dec 15, 2021 at 10:54 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: > > > > > > > > > > On 12/15/21 18:24, sdf@xxxxxxxxxx wrote: > [...] > > > > > > I can probably do more experiments on my side once your patch is > > > > > > accepted. I'm mostly concerned with getsockopt(TCP_ZEROCOPY_RECEIVE). > > > > > > If you claim there is visible overhead for a direct call then there > > > > > > should be visible benefit to using CGROUP_BPF_TYPE_ENABLED there as > > > > > > well. > > > > > > > > > > Interesting, sounds getsockopt might be performance sensitive to > > > > > someone. > > > > > > > > > > FWIW, I forgot to mention that for testing tx I'm using io_uring > > > > > (for both zc and not) with good submission batching. > > > > > > > > Yeah, last time I saw 2-3% as well, but it was due to kmalloc, see > > > > more details in 9cacf81f8161, it was pretty visible under perf. > > > > That's why I'm a bit skeptical of your claims of direct calls being > > > > somehow visible in these 2-3% (even skb pulls/pushes are not 2-3%?). > > > > > > migrate_disable/enable together were taking somewhat in-between > > > 1% and 1.5% in profiling, don't remember the exact number. The rest > > > should be from rcu_read_lock/unlock() in BPF_PROG_RUN_ARRAY_CG_FLAGS() > > > and other extra bits on the way. > > > > You probably have a preemptiple kernel and preemptible rcu which most > > likely explains why you see the overhead and I won't (non-preemptible > > kernel in our env, rcu_read_lock is essentially a nop, just a compiler > > barrier). > > Right. For reference tried out non-preemptible, perf shows the function > taking 0.8% with a NIC and 1.2% with a dummy netdev. > > > > > I'm skeptical I'll be able to measure inlining one function, > > > variability between boots/runs is usually greater and would hide it. > > > > Right, that's why I suggested to mirror what we do in set/getsockopt > > instead of the new extra CGROUP_BPF_TYPE_ENABLED. But I'll leave it up > > to you, Martin and the rest. I also suggested to try to stay with one way for fullsock context in v2 but it is for code readability reason. How about calling CGROUP_BPF_TYPE_ENABLED() just next to cgroup_bpf_enabled() in BPF_CGROUP_RUN_PROG_*SOCKOPT_*() instead ? It is because both cgroup_bpf_enabled() and CGROUP_BPF_TYPE_ENABLED() want to check if there is bpf to run before proceeding everything else and then I don't need to jump to the non-inline function itself to see if there is other prog array empty check. Stan, do you have concern on an extra inlined sock_cgroup_ptr() when there is bpf prog to run for set/getsockopt()? I think it should be mostly noise from looking at __cgroup_bpf_run_filter_*sockopt()?