On Tue, Aug 6, 2024 at 10:31 AM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Tue, Aug 6, 2024 at 12:25 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > > > On Mon, Aug 05, 2024 at 09:29:35PM -0700, Andrii Nakryiko wrote: > > > Add multi-uprobe and multi-uretprobe benchmarks to bench tool. > > > Multi- and classic uprobes/uretprobes have different low-level > > > triggering code paths, so it's sometimes important to be able to > > > benchmark both flavors of uprobes/uretprobes. > > > > > > Sample examples from my dev machine below. Single-threaded peformance > > > almost doesn't differ, but with more parallel CPUs triggering the same > > > uprobe/uretprobe the difference grows. This might be due to [0], but > > > given the code is slightly different, there could be other sources of > > > slowdown. > > > > > > Note, all these numbers will change due to ongoing work to improve > > > uprobe/uretprobe scalability (e.g., [1]), but having benchmark like this > > > is useful for measurements and debugging nevertheless. > > > > > > uprobe-nop ( 1 cpus): 1.020 ± 0.005M/s ( 1.020M/s/cpu) > > > uretprobe-nop ( 1 cpus): 0.515 ± 0.009M/s ( 0.515M/s/cpu) > > > uprobe-multi-nop ( 1 cpus): 1.036 ± 0.004M/s ( 1.036M/s/cpu) > > > uretprobe-multi-nop ( 1 cpus): 0.512 ± 0.005M/s ( 0.512M/s/cpu) > > > > > > uprobe-nop ( 8 cpus): 3.481 ± 0.030M/s ( 0.435M/s/cpu) > > > uretprobe-nop ( 8 cpus): 2.222 ± 0.008M/s ( 0.278M/s/cpu) > > > uprobe-multi-nop ( 8 cpus): 3.769 ± 0.094M/s ( 0.471M/s/cpu) > > > uretprobe-multi-nop ( 8 cpus): 2.482 ± 0.007M/s ( 0.310M/s/cpu) > > > > > > uprobe-nop (16 cpus): 2.968 ± 0.011M/s ( 0.185M/s/cpu) > > > uretprobe-nop (16 cpus): 1.870 ± 0.002M/s ( 0.117M/s/cpu) > > > uprobe-multi-nop (16 cpus): 3.541 ± 0.037M/s ( 0.221M/s/cpu) > > > uretprobe-multi-nop (16 cpus): 2.123 ± 0.026M/s ( 0.133M/s/cpu) > > > > > > uprobe-nop (32 cpus): 2.524 ± 0.026M/s ( 0.079M/s/cpu) > > > uretprobe-nop (32 cpus): 1.572 ± 0.003M/s ( 0.049M/s/cpu) > > > uprobe-multi-nop (32 cpus): 2.717 ± 0.003M/s ( 0.085M/s/cpu) > > > uretprobe-multi-nop (32 cpus): 1.687 ± 0.007M/s ( 0.053M/s/cpu) > > > > nice, do you have script for this output? > > we could add it to benchs/run_bench_uprobes.sh > > > > I keep tuning those scripts to my own needs, so I'm not sure if it's > worth adding all of them to selftests. It's very similar to what we > already have, but see the exact script below: > > #!/bin/bash > > set -eufo pipefail > > for p in 1 8 16 32; do > for i in uprobe-nop uretprobe-nop uprobe-multi-nop uretprobe-multi-nop; do > summary=$(sudo ./bench -w1 -d3 -p$p -a trig-$i | tail -n1) > total=$(echo "$summary" | cut -d'(' -f1 | cut -d' ' -f3-) > percpu=$(echo "$summary" | cut -d'(' -f2 | cut -d')' -f1 | cut > -d'/' -f1) > printf "%-21s (%2d cpus): %s (%s/s/cpu)\n" $i $p "$total" "$percpu" > done > echo > done Added this script to commit log while applying.