Hi, On 12/16/2023 10:30 AM, Yonghong Song wrote: > Commit 41a5db8d8161 ("Add support for non-fix-size percpu mem allocation") > added support for non-fix-size percpu memory allocation. > Such allocation will allocate percpu memory for all buckets on all > cpus and the memory consumption is in the order to quadratic. > For example, let us say, 4 cpus, unit size 16 bytes, so each > cpu has 16 * 4 = 64 bytes, with 4 cpus, total will be 64 * 4 = 256 bytes. > Then let us say, 8 cpus with the same unit size, each cpu > has 16 * 8 = 128 bytes, with 8 cpus, total will be 128 * 8 = 1024 bytes. > So if the number of cpus doubles, the number of memory consumption > will be 4 times. So for a system with large number of cpus, the > memory consumption goes up quickly with quadratic order. > For example, for 4KB percpu allocation, 128 cpus. The total memory > consumption will 4KB * 128 * 128 = 64MB. Things will become > worse if the number of cpus is bigger (e.g., 512, 1024, etc.) SNIP > +__init int bpf_mem_alloc_percpu_init(struct bpf_mem_alloc *ma) > +{ > + struct bpf_mem_caches __percpu *pcc; > + > + pcc = __alloc_percpu_gfp(sizeof(struct bpf_mem_caches), 8, GFP_KERNEL); > + if (!pcc) > + return -ENOMEM; > + > + ma->caches = pcc; > + ma->percpu = true; > + return 0; > +} > + > +int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size) > +{ > + int cpu, i, err = 0, unit_size, percpu_size; > + struct bpf_mem_caches *cc, __percpu *pcc; > + struct obj_cgroup *objcg; > + struct bpf_mem_cache *c; > + > + i = bpf_mem_cache_idx(size); > + if (i < 0) > + return -EINVAL; > + > + /* room for llist_node and per-cpu pointer */ > + percpu_size = LLIST_NODE_SZ + sizeof(void *); > + > + pcc = ma->caches; > + unit_size = sizes[i]; > + > +#ifdef CONFIG_MEMCG_KMEM > + objcg = get_obj_cgroup_from_current(); > +#endif For bpf_global_percpu_ma, we also need to account the allocated memory to root memory cgroup just like bpf_global_ma did, do we ? So it seems that we need to initialize c->objcg early in bpf_mem_alloc_percpu_init (). > + for_each_possible_cpu(cpu) { > + cc = per_cpu_ptr(pcc, cpu); > + c = &cc->cache[i]; > + if (cpu == 0 && c->unit_size) > + goto out; > + > + c->unit_size = unit_size; > + c->objcg = objcg; > + c->percpu_size = percpu_size; > + c->tgt = c; > + > + init_refill_work(c); > + prefill_mem_cache(c, cpu); > + > + if (cpu == 0) { > + err = check_obj_size(c, i); > + if (err) { > + drain_mem_cache(c); > + memset(c, 0, sizeof(*c)); I also forgot about c->objcg. objcg may be leaked if we do memset() here. > + goto out; > + } > + } > + } > + > +out: > + return err; > +} > + > .