Re: [PATCH bpf] bpf: Support for setting numa node in bpf memory allocator

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Fri, 21 Oct 2022 09:43:08 +0800

Hi,

On 10/21/2022 2:01 AM, Hao Luo wrote:
> On Thu, Oct 20, 2022 at 6:57 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>> From: Hou Tao <houtao1@xxxxxxxxxx>
>>
>> Since commit fba1a1c6c912 ("bpf: Convert hash map to bpf_mem_alloc."),
>> numa node setting for non-preallocated hash table is ignored. The reason
>> is that bpf memory allocator only supports NUMA_NO_NODE, but it seems it
>> is trivial to support numa node setting for bpf memory allocator.
>>
>> So adding support for setting numa node in bpf memory allocator and
>> updating hash map accordingly.
>>
>> Fixes: fba1a1c6c912 ("bpf: Convert hash map to bpf_mem_alloc.")
>> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>
>> ---
> Looks good to me with a few nits.
>
> <...>
>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
>> index fc116cf47d24..44c531ba9534 100644
>> --- a/kernel/bpf/memalloc.c
>> +++ b/kernel/bpf/memalloc.c
> <...>
>> +static inline bool is_valid_numa_node(int numa_node, bool percpu)
>> +{
>> +       return numa_node == NUMA_NO_NODE ||
>> +              (!percpu && (unsigned int)numa_node < nr_node_ids);
> Maybe also check node_online? There is a similar helper function in
> kernel/bpf/syscall.c.
Will factor out as a helper function and use it in bpf memory allocator in v2.
>
> It may help debugging if we could log the reason here, for example,
> PERCPU map but with numa_node specified.
Not sure about it, because the validity check must have been done in map related
code.

>
>> +}
>> +
>> +/* The initial prefill is running in the context of map creation process, so
>> + * if the preferred numa node is NUMA_NO_NODE, needs to use numa node of the
>> + * specific cpu instead.
>> + */
>> +static inline int get_prefill_numa_node(int numa_node, int cpu)
>> +{
>> +       int prefill_numa_node;
>> +
>> +       if (numa_node == NUMA_NO_NODE)
>> +               prefill_numa_node = cpu_to_node(cpu);
>> +       else
>> +               prefill_numa_node = numa_node;
>> +       return prefill_numa_node;
>>  }
> nit: an alternative implementation is
>
>  return numa_node == NUMA_NO_NODE ? cpu_to_node(cpu) : numa_node;
It is shorter and better. Will do it in v2.
>
>>  /* When size != 0 bpf_mem_cache for each cpu.
>> @@ -359,13 +383,17 @@ static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu)
>>   * kmalloc/kfree. Max allocation size is 4096 in this case.
>>   * This is bpf_dynptr and bpf_kptr use case.
>>   */
> We added a parameter to this function, I think it is worth mentioning
> the 'numa_node' argument's behavior under different values of
> 'percpu'.
How about the following comments ?

 * For per-cpu allocator (percpu=true), the only valid value of numa_node is
 * NUMA_NO_NODE. For non-per-cpu allocator, if numa_node is NUMA_NO_NODE, the
 * preferred memory allocation node is the numa node where the allocating CPU
 * is located, else the preferred node is the specified numa_node.

>
>> -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
>> +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, int numa_node,
>> +                      bool percpu)
>>  {
> <...>
>> --
>> 2.29.2
>>