On 12/15/21 22:07, Stanislav Fomichev wrote:
On Wed, Dec 15, 2021 at 11:55 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
On 12/15/21 19:15, Stanislav Fomichev wrote:
On Wed, Dec 15, 2021 at 10:54 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
On 12/15/21 18:24, sdf@xxxxxxxxxx wrote:
[...]
I can probably do more experiments on my side once your patch is
accepted. I'm mostly concerned with getsockopt(TCP_ZEROCOPY_RECEIVE).
If you claim there is visible overhead for a direct call then there
should be visible benefit to using CGROUP_BPF_TYPE_ENABLED there as
well.
Interesting, sounds getsockopt might be performance sensitive to
someone.
FWIW, I forgot to mention that for testing tx I'm using io_uring
(for both zc and not) with good submission batching.
Yeah, last time I saw 2-3% as well, but it was due to kmalloc, see
more details in 9cacf81f8161, it was pretty visible under perf.
That's why I'm a bit skeptical of your claims of direct calls being
somehow visible in these 2-3% (even skb pulls/pushes are not 2-3%?).
migrate_disable/enable together were taking somewhat in-between
1% and 1.5% in profiling, don't remember the exact number. The rest
should be from rcu_read_lock/unlock() in BPF_PROG_RUN_ARRAY_CG_FLAGS()
and other extra bits on the way.
You probably have a preemptiple kernel and preemptible rcu which most
likely explains why you see the overhead and I won't (non-preemptible
kernel in our env, rcu_read_lock is essentially a nop, just a compiler
barrier).
Right. For reference tried out non-preemptible, perf shows the function
taking 0.8% with a NIC and 1.2% with a dummy netdev.
I'm skeptical I'll be able to measure inlining one function,
variability between boots/runs is usually greater and would hide it.
Right, that's why I suggested to mirror what we do in set/getsockopt
instead of the new extra CGROUP_BPF_TYPE_ENABLED. But I'll leave it up
to you, Martin and the rest.
--
Pavel Begunkov