On 2025-01-11 06:06, Ritesh Harjani (IBM) wrote:
gather_bootmem_prealloc() function assumes the start nid as 0 and size as num_node_state(N_MEMORY). That means in case if memory attached numa nodes are interleaved, then gather_bootmem_prealloc_parallel() will fail to scan few of these nodes. Since memory attached numa nodes can be interleaved in any fashion, hence ensure that the current code checks for all numa node ids (.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that it can distributes all nr_node_ids among the these many no. threads.
With this change we guarantee that we check huge_boot_pages[] for all available nodes (vs. only for num_node_state(N_MEMORY)) which looks correct to me so: Reviewed-by: Luiz Capitulino <luizcap@xxxxxxxxxx> Now, although not really related to this patch, there's one detail: in gather_bootmem_prealloc_node() we call prep_and_add_bootmem_folios() even when folio_list is empty. This may cause a few calls to flush_tlb_all() down the code path when CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP=y even when huge_boot_pages[] is empty...
e.g. qemu cmdline ======================== numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20" mem_cmd="-object memory-backend-ram,id=mem1,size=16G" w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2): ========================== ~ # cat /proc/meminfo |grep -i huge AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 0 kB with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2): =========================== ~ # cat /proc/meminfo |grep -i huge AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 2 HugePages_Free: 2 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 2097152 kB Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization") Cc: Donet Tom <donettom@xxxxxxxxxxxxx> Cc: Gang Li <gang.li@xxxxxxxxx> Cc: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> Cc: Muchun Song <muchun.song@xxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: Sourabh Jain <sourabhjain@xxxxxxxxxxxxx> Cc: linux-mm@xxxxxxxxx Suggested-by: Muchun Song <muchun.song@xxxxxxxxx> Reported-by: Pavithra Prakash <pavrampu@xxxxxxxxxxxxx> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> --- v1 -> v2: 1. Made .size = nr_node_ids instead of only online nodes as suggested by Muchun. [v1]: https://lore.kernel.org/linux-mm/7e0ca1e8acd7dd5c1fe7cbb252de4eb55a8e851b.1727984881.git.ritesh.list@xxxxxxxxx mm/hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c498874a7170..4e2a1e907ec5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3289,7 +3289,7 @@ static void __init gather_bootmem_prealloc(void) .thread_fn = gather_bootmem_prealloc_parallel, .fn_arg = NULL, .start = 0, - .size = num_node_state(N_MEMORY), + .size = nr_node_ids, .align = 1, .min_chunk = 1, .max_threads = num_node_state(N_MEMORY), -- 2.39.5