Re: [PATCH bpf-next v3 4/6] bpf: use execmem_alloc for bpf program and bpf dispatcher

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 16 Nov 2022 18:10:23 -0800

On Wed, Nov 16, 2022 at 6:04 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
>
> On Wed, Nov 16, 2022 at 05:06:19PM -0800, Song Liu wrote:
> > Use execmem_alloc, execmem_free, and execmem_fill instead of
> > bpf_prog_pack_alloc, bpf_prog_pack_free, and bpf_arch_text_copy.
> >
> > execmem_free doesn't require extra size information. Therefore, the free
> > and error handling path can be simplified.
> >
> > There are some tests that show the benefit of execmem_alloc.
> >
> > Run 100 instances of the following benchmark from bpf selftests:
> >   tools/testing/selftests/bpf/bench -w2 -d100 -a trig-kprobe
> > which loads 7 BPF programs, and triggers one of them.
> >
> > Then use perf to monitor TLB related counters:
> >    perf stat -e iTLB-load-misses,itlb_misses.walk_completed_4k, \
> >            itlb_misses.walk_completed_2m_4m -a
> >
> > The following results are from a qemu VM with 32 cores.
> >
> > Before bpf_prog_pack:
> >   iTLB-load-misses: 350k/s
> >   itlb_misses.walk_completed_4k: 90k/s
> >   itlb_misses.walk_completed_2m_4m: 0.1/s
> >
> > With bpf_prog_pack (current upstream):
> >   iTLB-load-misses: 220k/s
> >   itlb_misses.walk_completed_4k: 68k/s
> >   itlb_misses.walk_completed_2m_4m: 0.2/s
> >
> > With execmem_alloc (with this set):
> >   iTLB-load-misses: 185k/s
> >   itlb_misses.walk_completed_4k: 58k/s
> >   itlb_misses.walk_completed_2m_4m: 1/s
>
> Wonderful.
>
> It would be nice to have this integrated into the bpf selftest,

No. Luis please stop suggesting things that don't make sense.
selftest/bpf are not doing performance benchmarking.
We have the 'bench' tool for that.
That's what Song used and it's only running standalone
and not part of any CI.