ping ? On 4/8/2023 10:18 PM, Hou Tao wrote: > From: Hou Tao <houtao1@xxxxxxxxxx> > > Hi, > > As discussed in v1, currently the freed objects in bpf memory allocator > may be reused immediately by the new allocation, it introduces > use-after-bpf-ma-free problem for non-preallocated hash map and makes > lookup procedure return incorrect result. The immediate reuse also makes > introducing new use case more difficult (e.g. qp-trie). > > The patch series tries to introduce BPF_MA_REUSE_AFTER_RCU_GP to solve > these problems. For BPF_MA_REUSE_AFTER_GP, the freed objects are reused > only after one RCU grace period and may be freed by bpf memory allocator > after another RCU-tasks-trace grace period. So for bpf programs which > care about reuse problem, these programs can use > bpf_rcu_read_{lock,unlock}() to access these freed objects safely and > for those which doesn't care, there will be safely use-after-bpf-ma-free > because these objects have not been freed by bpf memory allocator. > > The current implementation is far from perfect, but I think it is ready > for get some feedbacks before putting in more effort. The implementation > mainly focus on how to speed up the transition from freed elements to > reusable elements and try to reduce the risk of OOM. > > To accelerate the transition, it dynamically allocates rcu_head and call > call_rcu() in a kworker to do the transition. The frequency of call_rcu() > invocation could be improved by calling call_rcu() in irq work, but after > did that, I found the RCU grace period increased a lot and I still could > not figure out why. To reduce the risk of OOM, these reusable elements need > to be free as well, but we can not dynamically allocate rcu_head to do > that, because compared with RCU grace period RCU-tasks-trace grace > period is slower, so the freeing of reusable elements is just like the > freeing in normal bpf memory allocator, but these is one difference: for > BPF_MA_REUSE_AFTER_GP bpf ma these freeing elements are still available > for reuse in unit_alloc(). Please see individual patches for more details. > > Comments and suggestions are always welcome. > > Change Log: > v2: > * add a benchmark for bpf memory allocator to compare between different > flavor of bpf memory allocator. > * implement BPF_MA_REUSE_AFTER_RCU_GP for bpf memory allocator. > v1: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@xxxxxxxxxxxxxxx/ > > Hou Tao (4): > selftests/bpf: Add benchmark for bpf memory allocator > bpf: Factor out a common helper free_all() > bpf: Pass bitwise flags to bpf_mem_alloc_init() > bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP > > include/linux/bpf_mem_alloc.h | 9 +- > kernel/bpf/core.c | 2 +- > kernel/bpf/cpumask.c | 2 +- > kernel/bpf/hashtab.c | 5 +- > kernel/bpf/memalloc.c | 390 ++++++++++++++++-- > tools/testing/selftests/bpf/Makefile | 3 + > tools/testing/selftests/bpf/bench.c | 4 + > .../selftests/bpf/benchs/bench_htab_mem.c | 273 ++++++++++++ > .../selftests/bpf/progs/htab_mem_bench.c | 145 +++++++ > 9 files changed, 785 insertions(+), 48 deletions(-) > create mode 100644 tools/testing/selftests/bpf/benchs/bench_htab_mem.c > create mode 100644 tools/testing/selftests/bpf/progs/htab_mem_bench.c >