Re: [PATCH bpf-next v1 2/2] [no_merge] selftests/bpf: Benchmark runtime performance with private stack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 15, 2024 at 6:17 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
> With 4096 loop ierations per program run, I got
>   $ perf record -- ./bench -w3 -d10 -a --nr-batch-iters=4096 no-private-stack
>     27.89%  bench    [kernel.vmlinux]                  [k] htab_map_hash
>     21.55%  bench    [kernel.vmlinux]                  [k] _raw_spin_lock
>     11.51%  bench    [kernel.vmlinux]                  [k] htab_map_delete_elem
>     10.26%  bench    [kernel.vmlinux]                  [k] htab_map_update_elem
>      4.85%  bench    [kernel.vmlinux]                  [k] __pcpu_freelist_push
>      4.34%  bench    [kernel.vmlinux]                  [k] alloc_htab_elem
>      3.50%  bench    [kernel.vmlinux]                  [k] memcpy_orig
>      3.22%  bench    [kernel.vmlinux]                  [k] __pcpu_freelist_pop
>      2.68%  bench    [kernel.vmlinux]                  [k] bcmp
>      2.52%  bench    [kernel.vmlinux]                  [k] __htab_map_lookup_elem


so the prog itself is not even in the top 10 which means
that the test doesn't measure anything meaningful about the private
stack itself.
It just benchmarks hash map and overhead of extra push/pop is invisible.

> +SEC("tp/syscalls/sys_enter_getpgid")
> +int stack0(void *ctx)
> +{
> +       struct data_t key = {}, value = {};
> +       struct data_t *pvalue;
> +       int i;
> +
> +       hits++;
> +       key.d[10] = 5;
> +       value.d[8] = 10;
> +
> +       for (i = 0; i < batch_iters; i++) {
> +               pvalue = bpf_map_lookup_elem(&htab, &key);
> +               if (!pvalue)
> +                       bpf_map_update_elem(&htab, &key, &value, 0);
> +               bpf_map_delete_elem(&htab, &key);
> +       }

Instead of calling helpers that do a lot of work the test should
call global subprograms or noinline static functions that are nops.
Only then we might see the overhead of push/pop r9.

Once you do that you'll see that
+SEC("tp/syscalls/sys_enter_getpgid")
approach has too much overhead.
(you don't see right now since hashmap dominates).
Pls use an approach I mentioned earlier by fentry-ing into
a helper and another prog calling that helper in for() loop.

pw-bot: cr





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux