Hello Ritesh,
On 11/01/25 16:36, Ritesh Harjani (IBM) wrote:
gather_bootmem_prealloc() function assumes the start nid as 0 and size as
num_node_state(N_MEMORY). That means in case if memory attached numa nodes
are interleaved, then gather_bootmem_prealloc_parallel() will fail to scan
few of these nodes.
Yes, I observed this issue on PowerVM platform where:
[root ~]# dmesg | grep -i numa
[ 0.000000] numa: Partition configured for 32 NUMA nodes.
[ 0.631132] numa: Node 1 CPUs: 0-15
With the above numa configuration passing `hugepagesz=1G hugepages=2`
the Hugetlb
states in /proc/meminfo was 0.
#cat /proc/meminfo
snip...
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
Where as with this FIX Included passing `hugepagesz=1G hugepages=2` on
the same machine
showing correct Hugetlb states in /proc/meminfo:
# cat /proc/meminfo
snip...
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 2097152 kB
Thanks for the fix:
Tested-by: Sourabh Jain <sourabhjain@xxxxxxxxxxxxx>
- Sourabh Jain
Since memory attached numa nodes can be interleaved in any fashion, hence
ensure that the current code checks for all numa node ids
(.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that
it can distributes all nr_node_ids among the these many no. threads.
e.g. qemu cmdline
========================
numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20"
mem_cmd="-object memory-backend-ram,id=mem1,size=16G"
w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
==========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 0 kB
with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
===========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 2
HugePages_Free: 2
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 2097152 kB
Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization")
Cc: Donet Tom <donettom@xxxxxxxxxxxxx>
Cc: Gang Li <gang.li@xxxxxxxxx>
Cc: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx>
Cc: Muchun Song <muchun.song@xxxxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Cc: Sourabh Jain <sourabhjain@xxxxxxxxxxxxx>
Cc: linux-mm@xxxxxxxxx
Suggested-by: Muchun Song <muchun.song@xxxxxxxxx>
Reported-by: Pavithra Prakash <pavrampu@xxxxxxxxxxxxx>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx>
---
v1 -> v2:
1. Made .size = nr_node_ids instead of only online nodes as suggested by Muchun.
[v1]: https://lore.kernel.org/linux-mm/7e0ca1e8acd7dd5c1fe7cbb252de4eb55a8e851b.1727984881.git.ritesh.list@xxxxxxxxx
mm/hugetlb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c498874a7170..4e2a1e907ec5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3289,7 +3289,7 @@ static void __init gather_bootmem_prealloc(void)
.thread_fn = gather_bootmem_prealloc_parallel,
.fn_arg = NULL,
.start = 0,
- .size = num_node_state(N_MEMORY),
+ .size = nr_node_ids,
.align = 1,
.min_chunk = 1,
.max_threads = num_node_state(N_MEMORY),
--
2.39.5