Alexei Starovoitov wrote: > Add a test that benchmarks different ways of attaching BPF program to a kernel function. > Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: > $ ./test_progs -n 49 -v|grep events > task_rename base 2743K events per sec > task_rename kprobe 2419K events per sec > task_rename kretprobe 1876K events per sec > task_rename raw_tp 2578K events per sec > task_rename fentry 2710K events per sec > task_rename fexit 2685K events per sec > > On a kernel with retpoline: > $ ./test_progs -n 49 -v|grep events > task_rename base 2401K events per sec > task_rename kprobe 1930K events per sec > task_rename kretprobe 1485K events per sec > task_rename raw_tp 2053K events per sec > task_rename fentry 2351K events per sec > task_rename fexit 2185K events per sec > > All 5 approaches: > - kprobe/kretprobe in __set_task_comm() > - raw tracepoint in trace_task_rename() > - fentry/fexit in __set_task_comm() > are roughly equivalent. > > __set_task_comm() by itself is quite fast, so any extra instructions add up. > Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. > kprobe via ftrace was second best. kretprobe is slow due to trap. New > fentry/fexit methods via BPF trampoline are clearly the fastest and the > difference is more pronounced with retpoline on, since BPF trampoline doesn't > use indirect jumps. > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > --- > .../selftests/bpf/prog_tests/test_overhead.c | 142 ++++++++++++++++++ > .../selftests/bpf/progs/test_overhead.c | 43 ++++++ > 2 files changed, 185 insertions(+) > create mode 100644 tools/testing/selftests/bpf/prog_tests/test_overhead.c > create mode 100644 tools/testing/selftests/bpf/progs/test_overhead.c > > diff --git a/tools/testing/selftests/bpf/prog_tests/test_overhead.c b/tools/testing/selftests/bpf/prog_tests/test_overhead.c > new file mode 100644 > index 000000000000..c32aa28bd93f Acked-by: John Fastabend <john.fastabend@xxxxxxxxx>