Re: [PATCH bpf-next v2 3/6] bpf: Allow per unit prefill for non-fix-size percpu memory allocator

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Fri, 15 Dec 2023 15:40:40 +0800

Hi,

On 12/15/2023 3:27 PM, Yonghong Song wrote:
>
> On 12/14/23 10:50 PM, Yonghong Song wrote:
>>
>> On 12/14/23 7:19 PM, Hou Tao wrote:
>>>
>>> On 12/15/2023 8:12 AM, Yonghong Song wrote:
>>>> Commit 41a5db8d8161 ("Add support for non-fix-size percpu mem
>>>> allocation")
>>>> added support for non-fix-size percpu memory allocation.
>>>> Such allocation will allocate percpu memory for all buckets on all
>>>> cpus and the memory consumption is in the order to quadratic.
>>>> For example, let us say, 4 cpus, unit size 16 bytes, so each
>>>> cpu has 16 * 4 = 64 bytes, with 4 cpus, total will be 64 * 4 = 256
>>>> bytes.
>>>> Then let us say, 8 cpus with the same unit size, each cpu
>>>> has 16 * 8 = 128 bytes, with 8 cpus, total will be 128 * 8 = 1024
>>>> bytes.
>>>> So if the number of cpus doubles, the number of memory consumption
>>>> will be 4 times. So for a system with large number of cpus, the
>>>> memory consumption goes up quickly with quadratic order.
>>>> For example, for 4KB percpu allocation, 128 cpus. The total memory
>>>> consumption will 4KB * 128 * 128 = 64MB. Things will become
>>>> worse if the number of cpus is bigger (e.g., 512, 1024, etc.)

SNIP
>>>> +#ifdef CONFIG_MEMCG_KMEM
>>>> +    objcg = get_obj_cgroup_from_current();
>>>> +#endif
>>>> +    for_each_possible_cpu(cpu) {
>>>> +        cc = per_cpu_ptr(pcc, cpu);
>>>> +        c = &cc->cache[i];
>>>> +        if (cpu == 0 && c->unit_size)
>>>> +            goto out;
>>>> +
>>>> +        c->unit_size = unit_size;
>>>> +        c->objcg = objcg;
>>>> +        c->percpu_size = percpu_size;
>>>> +        c->tgt = c;
>>>> +
>>>> +        init_refill_work(c);
>>>> +        prefill_mem_cache(c, cpu);
>>>> +
>>>> +        if (cpu == 0) {
>>>> +            err = check_obj_size(c, i);
>>>> +            if (err) {
>>>> +                bpf_mem_alloc_destroy_cache(c);
>>> It seems drain_mem_cache() will be enough. Have you considered setting
>>
>> At prefill stage, looks like the following is enough:
>>     free_all(__llist_del_all(&c->free_llist), percpu);
>> But I agree that drain_mem_cache() is simpler and is
>> easier for future potential code change.
>>
>>> low_watermark as 0 to prevent potential refill in unit_alloc() if the
>>> initialization of the current unit fails ?
>>
>> I think it does make sense. For non-fix-size non-percpu prefill,
>> if check_obj_size() failed, the prefill will fail, which include
>> all buckets.
>>
>> In this case, if it fails for a particular bucket, we should
>> make sure that bucket always return NULL ptr, so setting the
>> low_watermark to 0 does make sense.
>
> Thinking again. If the initialization of the current unit
> failed, the verification will fail and the corresponding
> bpf program will not be able to do memory alloc, so we
> should be fine.
>
> But it is totally possible that some prog later may
> call bpf_mem_alloc_percpu_unit_init() again with the
> same size/bucket. So we should simply reset bpf_mem_cache
> to 0 during the previous failed bpf_mem_alloc_percpu_unit_init()
> call. Is it possible that check_obj_size() may initially
> returns an error but sometime later something in
> the kernel changed and the check_obj_size() with the
> same size could return true?

Resetting bpf_mem_cache as 0 is much simpler and easier to understand
than resetting low_watermark as 0. For per-cpu allocation, the return
value of pcpu_alloc_size() is stable and I don't think it will change
like ksize() does(), so it is not possible that the previous
check_obj_size() failed, but the new check_obj_size() for the same
unit_size succeeds.

>
>
>>
>>>> +                goto out;
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>> +out:
>>>> +    return err;
>>>> +}
>>>> +
>>>>   static void check_mem_cache(struct bpf_mem_cache *c)
>>>>   {
>>>> WARN_ON_ONCE(!llist_empty(&c->free_by_rcu_ttrace));
>>>>
>>> .
>>>
>>