On Wed, Nov 16, 2022 at 6:04 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote: > > On Wed, Nov 16, 2022 at 05:06:19PM -0800, Song Liu wrote: > > Use execmem_alloc, execmem_free, and execmem_fill instead of > > bpf_prog_pack_alloc, bpf_prog_pack_free, and bpf_arch_text_copy. > > > > execmem_free doesn't require extra size information. Therefore, the free > > and error handling path can be simplified. > > > > There are some tests that show the benefit of execmem_alloc. > > > > Run 100 instances of the following benchmark from bpf selftests: > > tools/testing/selftests/bpf/bench -w2 -d100 -a trig-kprobe > > which loads 7 BPF programs, and triggers one of them. > > > > Then use perf to monitor TLB related counters: > > perf stat -e iTLB-load-misses,itlb_misses.walk_completed_4k, \ > > itlb_misses.walk_completed_2m_4m -a > > > > The following results are from a qemu VM with 32 cores. > > > > Before bpf_prog_pack: > > iTLB-load-misses: 350k/s > > itlb_misses.walk_completed_4k: 90k/s > > itlb_misses.walk_completed_2m_4m: 0.1/s > > > > With bpf_prog_pack (current upstream): > > iTLB-load-misses: 220k/s > > itlb_misses.walk_completed_4k: 68k/s > > itlb_misses.walk_completed_2m_4m: 0.2/s > > > > With execmem_alloc (with this set): > > iTLB-load-misses: 185k/s > > itlb_misses.walk_completed_4k: 58k/s > > itlb_misses.walk_completed_2m_4m: 1/s > > Wonderful. > > It would be nice to have this integrated into the bpf selftest, No. Luis please stop suggesting things that don't make sense. selftest/bpf are not doing performance benchmarking. We have the 'bench' tool for that. That's what Song used and it's only running standalone and not part of any CI.