Hi, On 12/21/2023 1:00 PM, Yonghong Song wrote: > Commit 41a5db8d8161 ("Add support for non-fix-size percpu mem allocation") > added support for non-fix-size percpu memory allocation. > Such allocation will allocate percpu memory for all buckets on all > cpus and the memory consumption is in the order to quadratic. > For example, let us say, 4 cpus, unit size 16 bytes, so each > cpu has 16 * 4 = 64 bytes, with 4 cpus, total will be 64 * 4 = 256 bytes. > Then let us say, 8 cpus with the same unit size, each cpu > has 16 * 8 = 128 bytes, with 8 cpus, total will be 128 * 8 = 1024 bytes. > So if the number of cpus doubles, the number of memory consumption > will be 4 times. So for a system with large number of cpus, the > memory consumption goes up quickly with quadratic order. > For example, for 4KB percpu allocation, 128 cpus. The total memory > consumption will 4KB * 128 * 128 = 64MB. Things will become > worse if the number of cpus is bigger (e.g., 512, 1024, etc.) > > In Commit 41a5db8d8161, the non-fix-size percpu memory allocation is > done in boot time, so for system with large number of cpus, the initial > percpu memory consumption is very visible. For example, for 128 cpu > system, the total percpu memory allocation will be at least > (16 + 32 + 64 + 96 + 128 + 196 + 256 + 512 + 1024 + 2048 + 4096) > * 128 * 128 = ~138MB. > which is pretty big. It will be even bigger for larger number of cpus. SNIP > + > static void drain_mem_cache(struct bpf_mem_cache *c) > { > bool percpu = !!c->percpu_size; > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index f13008d27f35..08f9a49cc11c 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -12141,20 +12141,6 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, > if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] && !bpf_global_ma_set) > return -ENOMEM; > > - if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) { > - if (!bpf_global_percpu_ma_set) { > - mutex_lock(&bpf_percpu_ma_lock); > - if (!bpf_global_percpu_ma_set) { > - err = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true); > - if (!err) > - bpf_global_percpu_ma_set = true; > - } > - mutex_unlock(&bpf_percpu_ma_lock); > - if (err) > - return err; > - } > - } > - > if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) { > verbose(env, "local type ID argument must be in range [0, U32_MAX]\n"); > return -EINVAL; > @@ -12175,6 +12161,26 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, > return -EINVAL; > } > > + if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) { > + if (!bpf_global_percpu_ma_set) { > + mutex_lock(&bpf_percpu_ma_lock); > + if (!bpf_global_percpu_ma_set) { > + err = bpf_mem_alloc_percpu_init(&bpf_global_percpu_ma); Because ma->objcg is assigned as get_obj_cgroup_from_current(), so I think the memory account will be incorrect, right ? Maybe we should pass objcg to bpf_mem_alloc_percpu_init() explicit. For root memcg, I think the objcg is NULL. > + if (!err) > + bpf_global_percpu_ma_set = true; > + } > + mutex_unlock(&bpf_percpu_ma_lock); > + if (err) > + return err; > + } > + > + mutex_lock(&bpf_percpu_ma_lock); > + err = bpf_mem_alloc_percpu_unit_init(&bpf_global_percpu_ma, ret_t->size); > + mutex_unlock(&bpf_percpu_ma_lock); > + if (err) > + return err; > + } > + > struct_meta = btf_find_struct_meta(ret_btf, ret_btf_id); > if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) { > if (!__btf_type_is_scalar_struct(env, ret_btf, ret_t, 0)) {