On Fri, Jul 12, 2024 at 1:48 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > On 7/12/24 1:16 PM, Alexei Starovoitov wrote: > > On Thu, Jul 11, 2024 at 9:42 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > >> > >> It is clear that the main overhead is the push/pop r9 for > >> three calls. > >> > >> Five runs of the benchmarks: > >> > >> [root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh > >> no-private-stack: 0.662 ± 0.019M/s (drops 0.000 ± 0.000M/s) > >> private-stack: 0.673 ± 0.017M/s (drops 0.000 ± 0.000M/s) > >> [root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh > >> no-private-stack: 0.684 ± 0.005M/s (drops 0.000 ± 0.000M/s) > >> private-stack: 0.676 ± 0.008M/s (drops 0.000 ± 0.000M/s) > >> [root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh > >> no-private-stack: 0.673 ± 0.017M/s (drops 0.000 ± 0.000M/s) > >> private-stack: 0.683 ± 0.006M/s (drops 0.000 ± 0.000M/s) > >> [root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh > >> no-private-stack: 0.680 ± 0.011M/s (drops 0.000 ± 0.000M/s) > >> private-stack: 0.626 ± 0.050M/s (drops 0.000 ± 0.000M/s) > >> [root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh > >> no-private-stack: 0.686 ± 0.007M/s (drops 0.000 ± 0.000M/s) > >> private-stack: 0.683 ± 0.003M/s (drops 0.000 ± 0.000M/s) > >> > >> The performance is very similar between private-stack and no-private-stack. > > I'm not so sure. > > What is the "perf report" before/after? > > Are you sure that bench spends enough time inside the program itself? > > By the look of it it seems that most of the time will be in hashmap > > and syscall overhead. > > > > You need that batch's one that uses for loop and attached to a helper. > > See commit 7df4e597ea2c ("selftests/bpf: add batched, mostly in-kernel > > BPF triggering benchmarks") > > Okay, I see. The current approach is one trigger, one prog run where > each prog run exercise 3 syscalls. I should add a loop to the bpf > program to make bpf program spends majority of time. Will do this > in the next revision, plus running 'perf report'. please also benchmark on real hardware, VM will not give reliable results > > > > > I think the next version doesn't need RFC tag. patch 1 lgtm. >