Re: [RFC PATCH bpf-next v2 2/2] [no_merge] selftests/bpf: Benchmark runtime performance with private stack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 7/12/24 2:47 PM, Andrii Nakryiko wrote:
On Fri, Jul 12, 2024 at 1:48 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:

On 7/12/24 1:16 PM, Alexei Starovoitov wrote:
On Thu, Jul 11, 2024 at 9:42 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
It is clear that the main overhead is the push/pop r9 for
three calls.

Five runs of the benchmarks:

[root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh
no-private-stack:    0.662 ± 0.019M/s (drops 0.000 ± 0.000M/s)
private-stack:       0.673 ± 0.017M/s (drops 0.000 ± 0.000M/s)
[root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh
no-private-stack:    0.684 ± 0.005M/s (drops 0.000 ± 0.000M/s)
private-stack:       0.676 ± 0.008M/s (drops 0.000 ± 0.000M/s)
[root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh
no-private-stack:    0.673 ± 0.017M/s (drops 0.000 ± 0.000M/s)
private-stack:       0.683 ± 0.006M/s (drops 0.000 ± 0.000M/s)
[root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh
no-private-stack:    0.680 ± 0.011M/s (drops 0.000 ± 0.000M/s)
private-stack:       0.626 ± 0.050M/s (drops 0.000 ± 0.000M/s)
[root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh
no-private-stack:    0.686 ± 0.007M/s (drops 0.000 ± 0.000M/s)
private-stack:       0.683 ± 0.003M/s (drops 0.000 ± 0.000M/s)

The performance is very similar between private-stack and no-private-stack.
I'm not so sure.
What is the "perf report" before/after?
Are you sure that bench spends enough time inside the program itself?
By the look of it it seems that most of the time will be in hashmap
and syscall overhead.

You need that batch's one that uses for loop and attached to a helper.
See commit 7df4e597ea2c ("selftests/bpf: add batched, mostly in-kernel
BPF triggering benchmarks")
Okay, I see. The current approach is one trigger, one prog run where
each prog run exercise 3 syscalls. I should add a loop to the bpf
program to make bpf program spends majority of time. Will do this
in the next revision, plus running 'perf report'.
please also benchmark on real hardware, VM will not give reliable results

Sure. Will do.


I think the next version doesn't need RFC tag. patch 1 lgtm.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux