On Wed, Jun 5, 2024 at 6:41 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > Hi Alexei, > > On 6/5/2024 9:48 PM, Alexei Starovoitov wrote: > > Hi Hou, > > > > Are you still working on qp-trie ? > > All prerequisites (like bpf_mem_alloc support) have landed. > > Anything keeping you from respinning this set? > > Sorry, it is paused due to my limited time for bpf subsystem recently. > During the limited time for bpf subsystem, I am still trying to resolve > the huge memory usage for global bpf_mem_alloc. The problem can be > demonstrated by using the bpf_ma benchmark [1] and it happens as follows: that was the issue with per-cpu only, no? > (1) there are intensive allocation/free calls for global bpf_mem_alloc > in one period on a specific CPU > (2) there is not any call afterwards on this CPU > (3) these two RCU callbacks in bpf memory allocator end, and it will not > be called anymore, because there is not unit_free()/unit_free_rcu() call > on the CPU > (4) but there will be many objects in free_by_rcu and free_by_rcu_ttrace > which are not freed. I don't quite see how that can happen. > I am working on a patch-set which tries to resolve the problem by the > following two methods: > (1) track the active refcount of global bpf memory allocator hold by bpf > programs and bpf maps and invoke a new bpf_mem_alloc_flush() API to > flush freeable objects in these lists when the active refcount goes down > as zero. > (2) try to call call_rcu_tasks_trace() nested if there are freeable > objects in the free_by_rcu_ttrace, because bpf_mem_alloc_flush may leave > these freeable objects due to concurrency with __free_by_rcu(). I feel you're seeing something else related to long delays in rcu_tasks_trace GP or weirdness with per-cpu alloc. > I hope the RFC patch-set for global bpf memory allocator will be posted > before next week. After that, I will try to continue my work on qp-trie. Anyway, at the last LPC there was a discussion to generalize all of bpf_ma logic and make it part of slab. So I suggest we hold on to any further changes to bpf_ma. Please prioritize qp-trie. It's more urgent. At LPC multiple folks requested a good data structure to store variable length objects.