On Mon, Jun 26, 2023 at 10:58 AM Puranjay Mohan <puranjay12@xxxxxxxxx> wrote: > > BPF programs currently consume a page each on ARM64. For systems with many BPF > programs, this adds significant pressure to instruction TLB. High iTLB pressure > usually causes slow down for the whole system. > > Song Liu introduced the BPF prog pack allocator[1] to mitigate the above issue. > It packs multiple BPF programs into a single huge page. It is currently only > enabled for the x86_64 BPF JIT. > > This patch series enables the BPF prog pack allocator for the ARM64 BPF JIT. > > ==================================================== > Performance Analysis of prog pack allocator on ARM64 > ==================================================== > > To test the performance of the BPF prog pack allocator on ARM64, a stresser > tool[2] was built. This tool loads 8 BPF programs on the system and triggers > 5 of them in an infinite loop by doing system calls. > > The runner script starts 20 instances of the above which loads 8*20=160 BPF > programs on the system, 5*20=100 of which are being constantly triggered. > > In the above environment we try to build Python-3.8.4 and try to find different > iTLB metrics for the compilation done by gcc-12.2.0. > > The source code[3] is configured with the following command: > ./configure --enable-optimizations --with-ensurepip=install > > Then the runner script is executed with the following command: > ./run.sh "perf stat -e ITLB_WALK,L1I_TLB,INST_RETIRED,iTLB-load-misses -a make -j32" > > This builds Python while 160 BPF programs are loaded and 100 are being constantly > triggered and measures iTLB related metrics. > > The output of the above command is discussed below before and after enabling the > BPF prog pack allocator. > > The tests were run on qemu-system-aarch64 with 32 cpus, 4G memory, -machine virt, > -cpu host, and -enable-kvm. > > Results > ------- > > Before enabling prog pack allocator: > ------------------------------------ > > Performance counter stats for 'system wide': > > 333278635 ITLB_WALK > 6762692976558 L1I_TLB > 25359571423901 INST_RETIRED > 15824054789 iTLB-load-misses > > 189.029769053 seconds time elapsed > > After enabling prog pack allocator: > ----------------------------------- > > Performance counter stats for 'system wide': > > 190333544 ITLB_WALK > 6712712386528 L1I_TLB > 25278233304411 INST_RETIRED > 5716757866 iTLB-load-misses > > 185.392650561 seconds time elapsed > > Improvements in metrics > ----------------------- > > Compilation time ---> 1.92% faster > iTLB-load-misses/Sec (Less is better) ---> 63.16% decrease > ITLB_WALK/1000 INST_RETIRED (Less is better) ---> 42.71% decrease > ITLB_Walk/L1I_TLB (Less is better) ---> 42.47% decrease > > [1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@xxxxxxxxxx/ > [2] https://github.com/puranjaymohan/BPF-Allocator-Bench > [3] https://www.python.org/ftp/python/3.8.4/Python-3.8.4.tgz > > Chanes in V3 => V4: Changes only in 3rd patch > 1. Fix the I-cache maintenance: Clean the data cache and invalidate the i-Cache > only *after* the instructions have been copied to the ROX region. > > Chanes in V2 => V3: Changes only in 3rd patch > 1. Set prog = orig_prog; in the failure path of bpf_jit_binary_pack_finalize() > call. > 2. Add comments explaining the usage of the offsets in the exception table. > > Changes in v1 => v2: > 1. Make the naming consistent in the 3rd patch: > ro_image and image > ro_header and header > ro_image_ptr and image_ptr > 2. Use names dst/src in place of addr/opcode in second patch. > 3. Add Acked-by: Song Liu <song@xxxxxxxxxx> in 1st and 2nd patch. > > Puranjay Mohan (3): > bpf: make bpf_prog_pack allocator portable > arm64: patching: Add aarch64_insn_copy() > bpf, arm64: use bpf_jit_binary_pack_alloc > > arch/arm64/include/asm/patching.h | 1 + > arch/arm64/kernel/patching.c | 39 ++++++++ > arch/arm64/net/bpf_jit_comp.c | 145 +++++++++++++++++++++++++----- > kernel/bpf/core.c | 8 +- > 4 files changed, 165 insertions(+), 28 deletions(-) > > -- > 2.40.1 > > FWIW Acked-by: Florent Revest <revest@xxxxxxxxxxxx> Thanks for this Puranjay!