Re: [RFC PATCH bpf-next v2 2/2] [no_merge] selftests/bpf: Benchmark runtime performance with private stack

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Fri, 12 Jul 2024 13:16:49 -0700

On Thu, Jul 11, 2024 at 9:42 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
>
> It is clear that the main overhead is the push/pop r9 for
> three calls.
>
> Five runs of the benchmarks:
>
> [root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh
> no-private-stack:    0.662 ± 0.019M/s (drops 0.000 ± 0.000M/s)
> private-stack:       0.673 ± 0.017M/s (drops 0.000 ± 0.000M/s)
> [root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh
> no-private-stack:    0.684 ± 0.005M/s (drops 0.000 ± 0.000M/s)
> private-stack:       0.676 ± 0.008M/s (drops 0.000 ± 0.000M/s)
> [root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh
> no-private-stack:    0.673 ± 0.017M/s (drops 0.000 ± 0.000M/s)
> private-stack:       0.683 ± 0.006M/s (drops 0.000 ± 0.000M/s)
> [root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh
> no-private-stack:    0.680 ± 0.011M/s (drops 0.000 ± 0.000M/s)
> private-stack:       0.626 ± 0.050M/s (drops 0.000 ± 0.000M/s)
> [root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh
> no-private-stack:    0.686 ± 0.007M/s (drops 0.000 ± 0.000M/s)
> private-stack:       0.683 ± 0.003M/s (drops 0.000 ± 0.000M/s)
>
> The performance is very similar between private-stack and no-private-stack.

I'm not so sure.
What is the "perf report" before/after?
Are you sure that bench spends enough time inside the program itself?
By the look of it it seems that most of the time will be in hashmap
and syscall overhead.

You need that batch's one that uses for loop and attached to a helper.
See commit 7df4e597ea2c ("selftests/bpf: add batched, mostly in-kernel
BPF triggering benchmarks")

I think the next version doesn't need RFC tag. patch 1 lgtm.