On a testing system with 2 physical NUMA node, 8GB memory, a small memory hole from 640KB to 1MB, and a large memory hole from 3GB to 4GB. If "numa=fake=1G" is used in kernel command line, the resulting fake NUMA nodes are as follows, NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x00000000-0xbfffffff] NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0x00000000-0x13fffffff] Faking node 0 at [mem 0x0000000000000000-0x0000000041ffffff] (1056MB) Faking node 1 at [mem 0x0000000140000000-0x000000017fffffff] (1024MB) Faking node 2 at [mem 0x0000000042000000-0x0000000081ffffff] (1024MB) Faking node 3 at [mem 0x0000000180000000-0x00000001bfffffff] (1024MB) Faking node 4 at [mem 0x0000000082000000-0x000000013fffffff] (3040MB) Faking node 5 at [mem 0x00000001c0000000-0x00000001ffffffff] (1024MB) Faking node 6 at [mem 0x0000000200000000-0x000000023fffffff] (1024MB) Where, 7 fake NUMA nodes are emulated, the size of fake node 4 is 3040 - 1024 = 2016MB. This is nearly 2 times of the size of the other fake nodes (about 1024MB). This isn't a reasonable splitting. The better way is to make the fake node size not too large or small. So in this patch, the splitting algorithm is changed to make the fake node size between 1/2 to 3/2 of the specified node size. After applying this patch, the resulting fake NUMA nodes become, Faking node 0 at [mem 0x0000000000000000-0x0000000041ffffff] (1056MB) Faking node 1 at [mem 0x0000000140000000-0x000000017fffffff] (1024MB) Faking node 2 at [mem 0x0000000042000000-0x0000000081ffffff] (1024MB) Faking node 3 at [mem 0x0000000180000000-0x00000001bfffffff] (1024MB) Faking node 4 at [mem 0x0000000082000000-0x0000000103ffffff] (2080MB) Faking node 5 at [mem 0x00000001c0000000-0x00000001ffffffff] (1024MB) Faking node 6 at [mem 0x0000000104000000-0x000000013fffffff] (960MB) Faking node 7 at [mem 0x0000000200000000-0x000000023fffffff] (1024MB) The newly added node 6 is a little smaller than the specified node size (960MB vs. 1024MB). But the overall results look more reasonable. Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Cc: Andy Lutomirski <luto@xxxxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Borislav Petkov <bp@xxxxxxxxx> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> Cc: Dan Williams <dan.j.williams@xxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: Dave Jiang <dave.jiang@xxxxxxxxx> --- arch/x86/mm/numa_emulation.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c index 683cd12f4793..231469e1de6a 100644 --- a/arch/x86/mm/numa_emulation.c +++ b/arch/x86/mm/numa_emulation.c @@ -300,9 +300,10 @@ static int __init split_nodes_size_interleave_uniform(struct numa_meminfo *ei, /* * If there won't be enough non-reserved memory for the * next node, this one must extend to the end of the - * physical node. + * physical node. The size of the emulated node should + * be between size/2 and size*3/2. */ - if ((limit - end - mem_hole_size(end, limit) < size) + if ((limit - end - mem_hole_size(end, limit) < size / 2) && !uniform) end = limit; -- 2.28.0