Hi Alexei, On 6/6/2024 11:45 AM, Alexei Starovoitov wrote: > On Wed, Jun 5, 2024 at 6:41 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >> Hi Alexei, >> >> On 6/5/2024 9:48 PM, Alexei Starovoitov wrote: >>> Hi Hou, >>> >>> Are you still working on qp-trie ? >>> All prerequisites (like bpf_mem_alloc support) have landed. >>> Anything keeping you from respinning this set? >> Sorry, it is paused due to my limited time for bpf subsystem recently. >> During the limited time for bpf subsystem, I am still trying to resolve >> the huge memory usage for global bpf_mem_alloc. The problem can be >> demonstrated by using the bpf_ma benchmark [1] and it happens as follows: > that was the issue with per-cpu only, no? No. Both bpf_global_ma and bpf_global_percpu_ma have the same problem. > >> (1) there are intensive allocation/free calls for global bpf_mem_alloc >> in one period on a specific CPU >> (2) there is not any call afterwards on this CPU >> (3) these two RCU callbacks in bpf memory allocator end, and it will not >> be called anymore, because there is not unit_free()/unit_free_rcu() call >> on the CPU >> (4) but there will be many objects in free_by_rcu and free_by_rcu_ttrace >> which are not freed. > I don't quite see how that can happen. > >> I am working on a patch-set which tries to resolve the problem by the >> following two methods: >> (1) track the active refcount of global bpf memory allocator hold by bpf >> programs and bpf maps and invoke a new bpf_mem_alloc_flush() API to >> flush freeable objects in these lists when the active refcount goes down >> as zero. >> (2) try to call call_rcu_tasks_trace() nested if there are freeable >> objects in the free_by_rcu_ttrace, because bpf_mem_alloc_flush may leave >> these freeable objects due to concurrency with __free_by_rcu(). > I feel you're seeing something else related to long delays > in rcu_tasks_trace GP or weirdness with per-cpu alloc. Er, rcu_tasks_trace GP is relatively slow, but I think it's due to the artificial alloc/free operations in bpf_ma benchmark is too fast. > >> I hope the RFC patch-set for global bpf memory allocator will be posted >> before next week. After that, I will try to continue my work on qp-trie. > Anyway, at the last LPC there was a discussion to generalize > all of bpf_ma logic and make it part of slab. > So I suggest we hold on to any further changes to bpf_ma. OK. I will postpone the change, but I still think posting a RFC for discussion may also benefit the generalization of bpf_ma in slub, andI could do that later. > > Please prioritize qp-trie. It's more urgent. > At LPC multiple folks requested a good data structure to store > variable length objects. > . OK. Will do qp-trie first. Could you elaborate one possible use case for the "variable length objects" thing ?