Re: [PATCH v2] mm/hugetlb: Fix hugepage allocation for interleaved memory nodes

Luiz Capitulino <luizcap@xxxxxxxxxx> · Mon, 20 Jan 2025 16:39:23 -0500

On 2025-01-11 06:06, Ritesh Harjani (IBM) wrote:
gather_bootmem_prealloc() function assumes the start nid as 0 and size as
num_node_state(N_MEMORY). That means in case if memory attached numa nodes
are interleaved, then gather_bootmem_prealloc_parallel() will fail to scan
few of these nodes.

Since memory attached numa nodes can be interleaved in any fashion, hence
ensure that the current code checks for all numa node ids
(.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that
it can distributes all nr_node_ids among the these many no. threads.

With this change we guarantee that we check huge_boot_pages[] for all
available nodes (vs. only for num_node_state(N_MEMORY)) which looks
correct to me so:

  Reviewed-by: Luiz Capitulino <luizcap@xxxxxxxxxx>

Now, although not really related to this patch, there's one detail: in
gather_bootmem_prealloc_node() we call prep_and_add_bootmem_folios()
even when folio_list is empty. This may cause a few calls to
flush_tlb_all() down the code path when CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP=y
even when huge_boot_pages[] is empty...


e.g. qemu cmdline
========================
numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20"
mem_cmd="-object memory-backend-ram,id=mem1,size=16G"

w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
==========================
~ # cat /proc/meminfo  |grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:               0 kB

with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
===========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       2
HugePages_Free:        2
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:         2097152 kB

Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization")
Cc: Donet Tom <donettom@xxxxxxxxxxxxx>
Cc: Gang Li <gang.li@xxxxxxxxx>
Cc: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx>
Cc: Muchun Song <muchun.song@xxxxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Cc: Sourabh Jain <sourabhjain@xxxxxxxxxxxxx>
Cc: linux-mm@xxxxxxxxx
Suggested-by: Muchun Song <muchun.song@xxxxxxxxx>
Reported-by: Pavithra Prakash <pavrampu@xxxxxxxxxxxxx>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx>
---
v1 -> v2:
1. Made .size = nr_node_ids instead of only online nodes as suggested by Muchun.

[v1]: https://lore.kernel.org/linux-mm/7e0ca1e8acd7dd5c1fe7cbb252de4eb55a8e851b.1727984881.git.ritesh.list@xxxxxxxxx

  mm/hugetlb.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c498874a7170..4e2a1e907ec5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3289,7 +3289,7 @@ static void __init gather_bootmem_prealloc(void)
  		.thread_fn	= gather_bootmem_prealloc_parallel,
  		.fn_arg		= NULL,
  		.start		= 0,
-		.size		= num_node_state(N_MEMORY),
+		.size		= nr_node_ids,
  		.align		= 1,
  		.min_chunk	= 1,
  		.max_threads	= num_node_state(N_MEMORY),
--
2.39.5