Hi, On 12/18/2023 2:30 PM, Yonghong Song wrote: > Typically for percpu map element or data structure, once allocated, > most operations are lookup or in-place update. Deletion are really > rare. Currently, for percpu data strcture, 4 elements will be > refilled if the size is <= 256. Let us just do with one element > for percpu data. For example, for size 256 and 128 cpus, the > potential saving will be 3 * 256 * 128 * 128 = 12MB. > > Acked-by: Hou Tao <houtao1@xxxxxxxxxx> > Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx> > --- > kernel/bpf/memalloc.c | 13 +++++++++---- > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c > index 50ab2fecc005..f37998662146 100644 > --- a/kernel/bpf/memalloc.c > +++ b/kernel/bpf/memalloc.c > @@ -485,11 +485,16 @@ static void init_refill_work(struct bpf_mem_cache *c) > > static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) > { > - /* To avoid consuming memory assume that 1st run of bpf > - * prog won't be doing more than 4 map_update_elem from > - * irq disabled region > + int cnt = 1; > + > + /* To avoid consuming memory, for non-percpu allocation, assume that > + * 1st run of bpf prog won't be doing more than 4 map_update_elem from > + * irq disabled region if unit size is less than or equal to 256. > + * For all other cases, let us just do one allocation. > */ > - alloc_bulk(c, c->unit_size <= 256 ? 4 : 1, cpu_to_node(cpu), false); > + if (!c->percpu_size && c->unit_size <= 256) > + cnt = 4; > + alloc_bulk(c, cnt, cpu_to_node(cpu), false); > } Another thought about this patch. When the prefilled element is allocated by the invocation of bpf_percpu_obj_new(), the prefill will trigger again and this time it will allocate c->batch elements. For 256-bytes unit_size, c->batch will be 64, so the maximal memory consumption under 128-cpus host will be: 64 * 256 * 128 * 128 = 256MB when there is one allocation of bpf_percpu_obj_new() on each CPU. And my question is that should we adjust the low_watermark and high_watermark accordingly for per-cpu allocation to reduce the memory consumption ? > > static int check_obj_size(struct bpf_mem_cache *c, unsigned int idx)