On Wed, Nov 20, 2024 at 5:20 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > Hi Alexei, > > On 11/20/2024 9:16 AM, Alexei Starovoitov wrote: > > On Sun, Nov 17, 2024 at 4:56 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > >> > >> +enum { > >> + LPM_TRIE_MA_IM = 0, > >> + LPM_TRIE_MA_LEAF, > >> + LPM_TRIE_MA_CNT, > >> +}; > >> + > >> struct lpm_trie { > >> struct bpf_map map; > >> struct lpm_trie_node __rcu *root; > >> + struct bpf_mem_alloc ma[LPM_TRIE_MA_CNT]; > >> + struct bpf_mem_alloc *im_ma; > >> + struct bpf_mem_alloc *leaf_ma; > > We cannot use bpf_ma-s liberally like that. > > Freelists are not huge, but we shouldn't be adding new bpf_ma > > in every map and every use case. > > > > bpf_mem_cache_is_mergeable() in the previous patch also > > leaks implementation details. > > > > Can you use bpf_global_ma for all nodes? > > Will try. However, there are mainly two differences between > bpf_global_ma and map specific bpf_mem_alloc. The first one is the > memory accounting problem. All memories allocated from bpf_global_ma > will be accounted to the root memory cgroup instead of the current > memory cgroup (due to the return value of get_memcg()). I think we could > fix this partially by returning NULL instead of root_mem_cgroup when > c->objcg is NULL. However, even with the fix, the memory account is > still inaccurate, because these pre-allocated objects may be used by > other maps instead of the map which triggers the pre-allocation. That's a valid point. Though we ignore this issue in bpf_obj_new and other places if we can account into memgcg correctly we should do it. > The > second one is the freeing of freed objects when destroying the map. For > a map specific bpf_mem_alloc, most of these freed objects could be freed > immediately back to slub, However, it is not true for the bpf_global_ma, > because we could not tell whether the object belongs to a to-be-freed > map or not. And also we can not drain the bpf_global_ma just like we do > for bpf_mem_alloc. I don't think it's a big issue here. Optimizing delays in the free path is imo too soon. The extra complexity is not worth it. Let's do one bpf_ma for lpm of size LPM_TRIE_MA_LEAF. Inner nodes may be wasting memory and it's ok. The whole LPM trie is not efficient anyway. Micro-optiming at bpf_ma level is a small improvement compared to rewriting the whole LPM map as a more performance and memory efficient algorithm.