Hi, On 12/15/2023 3:27 PM, Yonghong Song wrote: > > On 12/14/23 10:50 PM, Yonghong Song wrote: >> >> On 12/14/23 7:19 PM, Hou Tao wrote: >>> >>> On 12/15/2023 8:12 AM, Yonghong Song wrote: >>>> Commit 41a5db8d8161 ("Add support for non-fix-size percpu mem >>>> allocation") >>>> added support for non-fix-size percpu memory allocation. >>>> Such allocation will allocate percpu memory for all buckets on all >>>> cpus and the memory consumption is in the order to quadratic. >>>> For example, let us say, 4 cpus, unit size 16 bytes, so each >>>> cpu has 16 * 4 = 64 bytes, with 4 cpus, total will be 64 * 4 = 256 >>>> bytes. >>>> Then let us say, 8 cpus with the same unit size, each cpu >>>> has 16 * 8 = 128 bytes, with 8 cpus, total will be 128 * 8 = 1024 >>>> bytes. >>>> So if the number of cpus doubles, the number of memory consumption >>>> will be 4 times. So for a system with large number of cpus, the >>>> memory consumption goes up quickly with quadratic order. >>>> For example, for 4KB percpu allocation, 128 cpus. The total memory >>>> consumption will 4KB * 128 * 128 = 64MB. Things will become >>>> worse if the number of cpus is bigger (e.g., 512, 1024, etc.) SNIP >>>> +#ifdef CONFIG_MEMCG_KMEM >>>> + objcg = get_obj_cgroup_from_current(); >>>> +#endif >>>> + for_each_possible_cpu(cpu) { >>>> + cc = per_cpu_ptr(pcc, cpu); >>>> + c = &cc->cache[i]; >>>> + if (cpu == 0 && c->unit_size) >>>> + goto out; >>>> + >>>> + c->unit_size = unit_size; >>>> + c->objcg = objcg; >>>> + c->percpu_size = percpu_size; >>>> + c->tgt = c; >>>> + >>>> + init_refill_work(c); >>>> + prefill_mem_cache(c, cpu); >>>> + >>>> + if (cpu == 0) { >>>> + err = check_obj_size(c, i); >>>> + if (err) { >>>> + bpf_mem_alloc_destroy_cache(c); >>> It seems drain_mem_cache() will be enough. Have you considered setting >> >> At prefill stage, looks like the following is enough: >> free_all(__llist_del_all(&c->free_llist), percpu); >> But I agree that drain_mem_cache() is simpler and is >> easier for future potential code change. >> >>> low_watermark as 0 to prevent potential refill in unit_alloc() if the >>> initialization of the current unit fails ? >> >> I think it does make sense. For non-fix-size non-percpu prefill, >> if check_obj_size() failed, the prefill will fail, which include >> all buckets. >> >> In this case, if it fails for a particular bucket, we should >> make sure that bucket always return NULL ptr, so setting the >> low_watermark to 0 does make sense. > > Thinking again. If the initialization of the current unit > failed, the verification will fail and the corresponding > bpf program will not be able to do memory alloc, so we > should be fine. > > But it is totally possible that some prog later may > call bpf_mem_alloc_percpu_unit_init() again with the > same size/bucket. So we should simply reset bpf_mem_cache > to 0 during the previous failed bpf_mem_alloc_percpu_unit_init() > call. Is it possible that check_obj_size() may initially > returns an error but sometime later something in > the kernel changed and the check_obj_size() with the > same size could return true? Resetting bpf_mem_cache as 0 is much simpler and easier to understand than resetting low_watermark as 0. For per-cpu allocation, the return value of pcpu_alloc_size() is stable and I don't think it will change like ksize() does(), so it is not possible that the previous check_obj_size() failed, but the new check_obj_size() for the same unit_size succeeds. > > >> >>>> + goto out; >>>> + } >>>> + } >>>> + } >>>> + >>>> +out: >>>> + return err; >>>> +} >>>> + >>>> static void check_mem_cache(struct bpf_mem_cache *c) >>>> { >>>> WARN_ON_ONCE(!llist_empty(&c->free_by_rcu_ttrace)); >>>> >>> . >>> >>