Re: [PATCH bpf-next v3 4/6] bpf: use execmem_alloc for bpf program and bpf dispatcher

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 16, 2022 at 06:10:23PM -0800, Alexei Starovoitov wrote:
> On Wed, Nov 16, 2022 at 6:04 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
> >
> > On Wed, Nov 16, 2022 at 05:06:19PM -0800, Song Liu wrote:
> > > Use execmem_alloc, execmem_free, and execmem_fill instead of
> > > bpf_prog_pack_alloc, bpf_prog_pack_free, and bpf_arch_text_copy.
> > >
> > > execmem_free doesn't require extra size information. Therefore, the free
> > > and error handling path can be simplified.
> > >
> > > There are some tests that show the benefit of execmem_alloc.
> > >
> > > Run 100 instances of the following benchmark from bpf selftests:
> > >   tools/testing/selftests/bpf/bench -w2 -d100 -a trig-kprobe
> > > which loads 7 BPF programs, and triggers one of them.
> > >
> > > Then use perf to monitor TLB related counters:
> > >    perf stat -e iTLB-load-misses,itlb_misses.walk_completed_4k, \
> > >            itlb_misses.walk_completed_2m_4m -a
> > >
> > > The following results are from a qemu VM with 32 cores.
> > >
> > > Before bpf_prog_pack:
> > >   iTLB-load-misses: 350k/s
> > >   itlb_misses.walk_completed_4k: 90k/s
> > >   itlb_misses.walk_completed_2m_4m: 0.1/s
> > >
> > > With bpf_prog_pack (current upstream):
> > >   iTLB-load-misses: 220k/s
> > >   itlb_misses.walk_completed_4k: 68k/s
> > >   itlb_misses.walk_completed_2m_4m: 0.2/s
> > >
> > > With execmem_alloc (with this set):
> > >   iTLB-load-misses: 185k/s
> > >   itlb_misses.walk_completed_4k: 58k/s
> > >   itlb_misses.walk_completed_2m_4m: 1/s
> >
> > Wonderful.
> >
> > It would be nice to have this integrated into the bpf selftest,
> 
> 
> No. Luis please stop suggesting things that don't make sense.
> selftest/bpf are not doing performance benchmarking.
> We have the 'bench' tool for that.
> That's what Song used and it's only running standalone
> and not part of any CI.

I'm not suggesting to instantiate the VM or crap like that, I'm just
asking for the simple script to run 100 instances. This allows folks
to reproduce results in an easy way.

Whether or not you don't want that for selftests/bpf -- fine, a simple
in commit script can easily represent a loop in bash if that's all
that was done.

  Luis




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux