On Tue, Jul 30, 2024 at 09:12:55AM +0800, Huang, Ying wrote: > > Right now HMAT appears to be used prescriptively, this despite the fact > > that there was a clear intent to separate CPU-nodes and non-CPU-nodes in > > the memory-tier code. So this patch simply realizes this intent when the > > hints are not very reasonable. > > If HMAT isn't available, it's hard to put memory devices to > appropriate memory tiers without other information. In commit > 992bf77591cb ("mm/demotion: add support for explicit memory tiers"), > Aneesh pointed out that it doesn't work for his system to put > non-CPU-nodes in lower tier. > Per Aneesh in 992bf77591cb - The code explicitly states the intent is to put non-CPU-nodes in a lower tier by default. The current implementation puts all nodes with CPU into the highest tier, and builds the tier hierarchy by establishing the per-node demotion targets based on the distances between nodes. This is accurate for the current code The current tier initialization code always initializes each memory-only NUMA node into a lower tier. This is *broken* for the currently upstream code. This appears to be the result of the hmat adistance callback introduction (though it may have been broken before that). ~Gregory