On Thu, Jul 11, 2024 at 9:42 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > It is clear that the main overhead is the push/pop r9 for > three calls. > > Five runs of the benchmarks: > > [root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh > no-private-stack: 0.662 ± 0.019M/s (drops 0.000 ± 0.000M/s) > private-stack: 0.673 ± 0.017M/s (drops 0.000 ± 0.000M/s) > [root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh > no-private-stack: 0.684 ± 0.005M/s (drops 0.000 ± 0.000M/s) > private-stack: 0.676 ± 0.008M/s (drops 0.000 ± 0.000M/s) > [root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh > no-private-stack: 0.673 ± 0.017M/s (drops 0.000 ± 0.000M/s) > private-stack: 0.683 ± 0.006M/s (drops 0.000 ± 0.000M/s) > [root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh > no-private-stack: 0.680 ± 0.011M/s (drops 0.000 ± 0.000M/s) > private-stack: 0.626 ± 0.050M/s (drops 0.000 ± 0.000M/s) > [root@arch-fb-vm1 bpf]# ./benchs/run_bench_private_stack.sh > no-private-stack: 0.686 ± 0.007M/s (drops 0.000 ± 0.000M/s) > private-stack: 0.683 ± 0.003M/s (drops 0.000 ± 0.000M/s) > > The performance is very similar between private-stack and no-private-stack. I'm not so sure. What is the "perf report" before/after? Are you sure that bench spends enough time inside the program itself? By the look of it it seems that most of the time will be in hashmap and syscall overhead. You need that batch's one that uses for loop and attached to a helper. See commit 7df4e597ea2c ("selftests/bpf: add batched, mostly in-kernel BPF triggering benchmarks") I think the next version doesn't need RFC tag. patch 1 lgtm.