On Wed, Oct 16, 2024 at 03:24:18PM +0300, Mike Rapoport wrote:
From: "Mike Rapoport (Microsoft)" <rppt@xxxxxxxxxx> vmalloc allocations with VM_ALLOW_HUGE_VMAP that do not explicitly specify node ID will use huge pages only if size_per_node is larger than a huge page. Still the actual allocated memory is not distributed between nodes and there is no advantage in such approach. On the contrary, BPF allocates SZ_2M * num_possible_nodes() for each new bpf_prog_pack, while it could do with a single huge page per pack. Don't account for number of nodes for VM_ALLOW_HUGE_VMAP with NUMA_NO_NODE and use huge pages whenever the requested allocation size is larger than a huge page. Signed-off-by: Mike Rapoport (Microsoft) <rppt@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> --- mm/vmalloc.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 634162271c00..86b2344d7461 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3763,8 +3763,6 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align, } if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) { - unsigned long size_per_node; - /* * Try huge pages. Only try for PAGE_KERNEL allocations, * others like modules don't yet expect huge pages in @@ -3772,13 +3770,10 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align, * supporting them. */ - size_per_node = size; - if (node == NUMA_NO_NODE) - size_per_node /= num_online_nodes(); - if (arch_vmap_pmd_supported(prot) && size_per_node >= PMD_SIZE) + if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE) shift = PMD_SHIFT; else - shift = arch_vmap_pte_supported_shift(size_per_node); + shift = arch_vmap_pte_supported_shift(size); align = max(real_align, 1UL << shift); size = ALIGN(real_size, 1UL << shift);
Looking at this place, i see that an overwriting a "size" approach seems as something that is a bit hard to follow. Below we have following code: <snip> ... again: area = __get_vm_area_node(real_size, align, shift, VM_ALLOC | VM_UNINITIALIZED | vm_flags, start, end, node, gfp_mask, caller); ... <snip> where we pass a "real_size", whereas there is only one place in the __vmalloc_node_range_noprof() function where a "size" is used. It is in the end of function: <snip> ... size = PAGE_ALIGN(size); if (!(vm_flags & VM_DEFER_KMEMLEAK)) kmemleak_vmalloc(area, size, gfp_mask); return area->addr; <snip> As fro this patch: Reviewed-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx> -- Uladzislau Rezki