From: Hou Tao <houtao1@xxxxxxxxxx> Hi, As discussed in v1, currently the freed objects in bpf memory allocator may be reused immediately by the new allocation, it introduces use-after-bpf-ma-free problem for non-preallocated hash map and makes lookup procedure return incorrect result. The immediate reuse also makes introducing new use case more difficult (e.g. qp-trie). The patch series tries to solve these problems by introducing BPF_MA_{REUSE|FREE}_AFTER_RCU_GP in bpf memory allocator. For REUSE_AFTER_GP, the freed objects are reused only after one RCU grace period and may be freed by bpf memory allocator after another RCU-tasks-trace grace period. So for bpf programs which care about reuse problem, these programs can use bpf_rcu_read_{lock,unlock}() to access these objects safely and for those which doesn't care, there will be safely use-after-bpf-ma-free because these objects have not been freed by bpf memory allocator. FREE_AFTER_GP behavior differently. Instead of making the freed elements being reusable after one RCU GP, it directly freed these elements back to slab after one RCU GP, so sleepable bpf program must use bpf_rcu_read_{lock,unlock}() to access elements allocated from FREE_AFTER_GP bpf memory allocator. Personally I prefer FREE_AFTER_RCU_GP because its implementation is much simpler compared with REUSE_AFTER_RCU and its memory usage is also better than REUSE_AFTER_GP. But its shortcoming is also obvious, so I want to get some feedback before putting in more effort. As usual, comments and suggestions are always welcome. Change Log: v3: * add BPF_MA_FREE_AFTER_RCU_GP bpf memory allocator * Update htab memory benchmark * move the benchmark patch to the last patch * remove array and useless bpf_map_lookup_elem(&array, ...) in bpf programs * add synchronization between addition CPU and deletion CPU for add_del_on_diff_cpu case to prevent unnecessary loop * add the benchmark result for "extra call_rcu + bpf ma" v2: https://lore.kernel.org/bpf/20230408141846.1878768-1-houtao@xxxxxxxxxxxxxxx/ * add a benchmark for bpf memory allocator to compare between different flavor of bpf memory allocator. * implement BPF_MA_REUSE_AFTER_RCU_GP for bpf memory allocator. v1: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@xxxxxxxxxxxxxxx/ Hou Tao (6): bpf: Factor out a common helper free_all() bpf: Pass bitwise flags to bpf_mem_alloc_init() bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP bpf: Introduce BPF_MA_FREE_AFTER_RCU_GP bpf: Add two module parameters in htab for memory benchmark selftests/bpf: Add benchmark for bpf memory allocator include/linux/bpf_mem_alloc.h | 10 +- kernel/bpf/core.c | 2 +- kernel/bpf/cpumask.c | 2 +- kernel/bpf/hashtab.c | 43 +- kernel/bpf/memalloc.c | 529 ++++++++++++++++-- tools/testing/selftests/bpf/Makefile | 3 + tools/testing/selftests/bpf/bench.c | 4 + .../selftests/bpf/benchs/bench_htab_mem.c | 352 ++++++++++++ .../bpf/benchs/run_bench_htab_mem.sh | 64 +++ .../selftests/bpf/progs/htab_mem_bench.c | 135 +++++ 10 files changed, 1090 insertions(+), 54 deletions(-) create mode 100644 tools/testing/selftests/bpf/benchs/bench_htab_mem.c create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_htab_mem.sh create mode 100644 tools/testing/selftests/bpf/progs/htab_mem_bench.c -- 2.29.2