Re: [PATCH bpf-next] selftests/bpf: add multi-uprobe benchmarks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 05, 2024 at 09:29:35PM -0700, Andrii Nakryiko wrote:
> Add multi-uprobe and multi-uretprobe benchmarks to bench tool.
> Multi- and classic uprobes/uretprobes have different low-level
> triggering code paths, so it's sometimes important to be able to
> benchmark both flavors of uprobes/uretprobes.
> 
> Sample examples from my dev machine below. Single-threaded peformance
> almost doesn't differ, but with more parallel CPUs triggering the same
> uprobe/uretprobe the difference grows. This might be due to [0], but
> given the code is slightly different, there could be other sources of
> slowdown.
> 
> Note, all these numbers will change due to ongoing work to improve
> uprobe/uretprobe scalability (e.g., [1]), but having benchmark like this
> is useful for measurements and debugging nevertheless.
> 
> uprobe-nop            ( 1 cpus):    1.020 ± 0.005M/s  (  1.020M/s/cpu)
> uretprobe-nop         ( 1 cpus):    0.515 ± 0.009M/s  (  0.515M/s/cpu)
> uprobe-multi-nop      ( 1 cpus):    1.036 ± 0.004M/s  (  1.036M/s/cpu)
> uretprobe-multi-nop   ( 1 cpus):    0.512 ± 0.005M/s  (  0.512M/s/cpu)
> 
> uprobe-nop            ( 8 cpus):    3.481 ± 0.030M/s  (  0.435M/s/cpu)
> uretprobe-nop         ( 8 cpus):    2.222 ± 0.008M/s  (  0.278M/s/cpu)
> uprobe-multi-nop      ( 8 cpus):    3.769 ± 0.094M/s  (  0.471M/s/cpu)
> uretprobe-multi-nop   ( 8 cpus):    2.482 ± 0.007M/s  (  0.310M/s/cpu)
> 
> uprobe-nop            (16 cpus):    2.968 ± 0.011M/s  (  0.185M/s/cpu)
> uretprobe-nop         (16 cpus):    1.870 ± 0.002M/s  (  0.117M/s/cpu)
> uprobe-multi-nop      (16 cpus):    3.541 ± 0.037M/s  (  0.221M/s/cpu)
> uretprobe-multi-nop   (16 cpus):    2.123 ± 0.026M/s  (  0.133M/s/cpu)
> 
> uprobe-nop            (32 cpus):    2.524 ± 0.026M/s  (  0.079M/s/cpu)
> uretprobe-nop         (32 cpus):    1.572 ± 0.003M/s  (  0.049M/s/cpu)
> uprobe-multi-nop      (32 cpus):    2.717 ± 0.003M/s  (  0.085M/s/cpu)
> uretprobe-multi-nop   (32 cpus):    1.687 ± 0.007M/s  (  0.053M/s/cpu)

nice, do you have script for this output? 
we could add it to benchs/run_bench_uprobes.sh

lgtm

Acked-by: Jiri Olsa <jolsa@xxxxxxxxxx>

jirka

> 
>   [0] https://lore.kernel.org/linux-trace-kernel/20240805202803.1813090-1-andrii@xxxxxxxxxx/
>   [1] https://lore.kernel.org/linux-trace-kernel/20240731214256.3588718-1-andrii@xxxxxxxxxx/
> 
> Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> ---
>  tools/testing/selftests/bpf/bench.c           | 12 +++
>  .../selftests/bpf/benchs/bench_trigger.c      | 81 +++++++++++++++----
>  .../selftests/bpf/progs/trigger_bench.c       |  7 ++
>  3 files changed, 85 insertions(+), 15 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
> index 90dc3aca32bd..1bd403a5ef7b 100644
> --- a/tools/testing/selftests/bpf/bench.c
> +++ b/tools/testing/selftests/bpf/bench.c
> @@ -520,6 +520,12 @@ extern const struct bench bench_trig_uprobe_push;
>  extern const struct bench bench_trig_uretprobe_push;
>  extern const struct bench bench_trig_uprobe_ret;
>  extern const struct bench bench_trig_uretprobe_ret;
> +extern const struct bench bench_trig_uprobe_multi_nop;
> +extern const struct bench bench_trig_uretprobe_multi_nop;
> +extern const struct bench bench_trig_uprobe_multi_push;
> +extern const struct bench bench_trig_uretprobe_multi_push;
> +extern const struct bench bench_trig_uprobe_multi_ret;
> +extern const struct bench bench_trig_uretprobe_multi_ret;

SNIP




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux