On Tue, Aug 6, 2024 at 12:25 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > On Mon, Aug 05, 2024 at 09:29:35PM -0700, Andrii Nakryiko wrote: > > Add multi-uprobe and multi-uretprobe benchmarks to bench tool. > > Multi- and classic uprobes/uretprobes have different low-level > > triggering code paths, so it's sometimes important to be able to > > benchmark both flavors of uprobes/uretprobes. > > > > Sample examples from my dev machine below. Single-threaded peformance > > almost doesn't differ, but with more parallel CPUs triggering the same > > uprobe/uretprobe the difference grows. This might be due to [0], but > > given the code is slightly different, there could be other sources of > > slowdown. > > > > Note, all these numbers will change due to ongoing work to improve > > uprobe/uretprobe scalability (e.g., [1]), but having benchmark like this > > is useful for measurements and debugging nevertheless. > > > > uprobe-nop ( 1 cpus): 1.020 ± 0.005M/s ( 1.020M/s/cpu) > > uretprobe-nop ( 1 cpus): 0.515 ± 0.009M/s ( 0.515M/s/cpu) > > uprobe-multi-nop ( 1 cpus): 1.036 ± 0.004M/s ( 1.036M/s/cpu) > > uretprobe-multi-nop ( 1 cpus): 0.512 ± 0.005M/s ( 0.512M/s/cpu) > > > > uprobe-nop ( 8 cpus): 3.481 ± 0.030M/s ( 0.435M/s/cpu) > > uretprobe-nop ( 8 cpus): 2.222 ± 0.008M/s ( 0.278M/s/cpu) > > uprobe-multi-nop ( 8 cpus): 3.769 ± 0.094M/s ( 0.471M/s/cpu) > > uretprobe-multi-nop ( 8 cpus): 2.482 ± 0.007M/s ( 0.310M/s/cpu) > > > > uprobe-nop (16 cpus): 2.968 ± 0.011M/s ( 0.185M/s/cpu) > > uretprobe-nop (16 cpus): 1.870 ± 0.002M/s ( 0.117M/s/cpu) > > uprobe-multi-nop (16 cpus): 3.541 ± 0.037M/s ( 0.221M/s/cpu) > > uretprobe-multi-nop (16 cpus): 2.123 ± 0.026M/s ( 0.133M/s/cpu) > > > > uprobe-nop (32 cpus): 2.524 ± 0.026M/s ( 0.079M/s/cpu) > > uretprobe-nop (32 cpus): 1.572 ± 0.003M/s ( 0.049M/s/cpu) > > uprobe-multi-nop (32 cpus): 2.717 ± 0.003M/s ( 0.085M/s/cpu) > > uretprobe-multi-nop (32 cpus): 1.687 ± 0.007M/s ( 0.053M/s/cpu) > > nice, do you have script for this output? > we could add it to benchs/run_bench_uprobes.sh > I keep tuning those scripts to my own needs, so I'm not sure if it's worth adding all of them to selftests. It's very similar to what we already have, but see the exact script below: #!/bin/bash set -eufo pipefail for p in 1 8 16 32; do for i in uprobe-nop uretprobe-nop uprobe-multi-nop uretprobe-multi-nop; do summary=$(sudo ./bench -w1 -d3 -p$p -a trig-$i | tail -n1) total=$(echo "$summary" | cut -d'(' -f1 | cut -d' ' -f3-) percpu=$(echo "$summary" | cut -d'(' -f2 | cut -d')' -f1 | cut -d'/' -f1) printf "%-21s (%2d cpus): %s (%s/s/cpu)\n" $i $p "$total" "$percpu" done echo done > lgtm > > Acked-by: Jiri Olsa <jolsa@xxxxxxxxxx> > > jirka > > > > > [0] https://lore.kernel.org/linux-trace-kernel/20240805202803.1813090-1-andrii@xxxxxxxxxx/ > > [1] https://lore.kernel.org/linux-trace-kernel/20240731214256.3588718-1-andrii@xxxxxxxxxx/ > > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > --- > > tools/testing/selftests/bpf/bench.c | 12 +++ > > .../selftests/bpf/benchs/bench_trigger.c | 81 +++++++++++++++---- > > .../selftests/bpf/progs/trigger_bench.c | 7 ++ > > 3 files changed, 85 insertions(+), 15 deletions(-) > > > > diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c > > index 90dc3aca32bd..1bd403a5ef7b 100644 > > --- a/tools/testing/selftests/bpf/bench.c > > +++ b/tools/testing/selftests/bpf/bench.c > > @@ -520,6 +520,12 @@ extern const struct bench bench_trig_uprobe_push; > > extern const struct bench bench_trig_uretprobe_push; > > extern const struct bench bench_trig_uprobe_ret; > > extern const struct bench bench_trig_uretprobe_ret; > > +extern const struct bench bench_trig_uprobe_multi_nop; > > +extern const struct bench bench_trig_uretprobe_multi_nop; > > +extern const struct bench bench_trig_uprobe_multi_push; > > +extern const struct bench bench_trig_uretprobe_multi_push; > > +extern const struct bench bench_trig_uprobe_multi_ret; > > +extern const struct bench bench_trig_uretprobe_multi_ret; > > SNIP