On Tue, Sep 09, 2014 at 03:46:30PM +0100, Mel Gorman wrote: > Changelog since v1 > o Default to zone-ordering on 32-bit and remove heuristics > o Expand changelog > > Zones are allocated by the page allocator in either node or zone order. > Node ordering is preferred in terms of locality and is applied automatically > in one of three cases. > > 1. If a node has only low memory > > 2. If DMA/DMA32 is a high percentage of memory > > 3. If low memory on a single node is greater than 70% of the node size > > Otherwise zone ordering is used to preserve low memory for devices that > require it. Unfortunately a consequence of this is that a machine with > balanced NUMA nodes will experience different performance characteristics > depending on which node they happen to start from. > > The point of zone ordering is to protect lower nodes for devices that > require DMA/DMA32 memory. When NUMA was first introduced, this was critical > as 32-bit NUMA machines existed and exhausting low memory triggered OOMs > easily as so many allocations required low memory. On 64-bit machines the > primary concern is devices that are 32-bit only which is less severe than > the low memory exhaustion problem on 32-bit NUMA. It seems there are really > few devices that depends on it. > > AGP -- I assume this is getting more rare but even then I think the allocations > happen early in boot time where lowmem pressure is less of a problem > > DRM -- If the device is 32-bit only then there may be low pressure. I didn't > evaluate these in detail but it looks like some of these are mobile > graphics card. Not many NUMA laptops out there. DRM folk should know > better though. > > Some TV cards -- Much demand for 32-bit capable TV cards on NUMA machines? > > B43 wireless card -- again not really a NUMA thing. > > I cannot find a good reason to incur a performance penalty on all 64-bit NUMA > machines in case someone throws a brain damanged TV or graphics card in there. > This patch defaults to node-ordering on 64-bit NUMA machines. I was tempted > to make it default everywhere but I understand that some embedded arches may > be using 32-bit NUMA where I cannot predict the consequences. > > The performance impact depends on the workload and the characteristics of the > machine and the machine I tested on had a large Normal zone on node 0 so the > impact is within the noise for the majority of tests. The allocation stats > show more allocation requests were from DMA32 and local node. Running SpecJBB > with multiple JVMs and automatic NUMA balancing disabled the results were > > specjbb > 3.17.0-rc2 3.17.0-rc2 > vanilla nodeorder-v1r1 > Min 1 29534.00 ( 0.00%) 30020.00 ( 1.65%) > Min 10 115717.00 ( 0.00%) 134038.00 ( 15.83%) > Min 19 109718.00 ( 0.00%) 114186.00 ( 4.07%) > Min 28 104459.00 ( 0.00%) 103639.00 ( -0.78%) > Min 37 98245.00 ( 0.00%) 103756.00 ( 5.61%) > Min 46 97198.00 ( 0.00%) 96197.00 ( -1.03%) > Mean 1 30953.25 ( 0.00%) 31917.75 ( 3.12%) > Mean 10 124432.50 ( 0.00%) 140904.00 ( 13.24%) > Mean 19 116033.50 ( 0.00%) 119294.75 ( 2.81%) > Mean 28 108365.25 ( 0.00%) 106879.50 ( -1.37%) > Mean 37 102984.75 ( 0.00%) 106924.25 ( 3.83%) > Mean 46 100783.25 ( 0.00%) 105368.50 ( 4.55%) > Stddev 1 1260.38 ( 0.00%) 1109.66 ( 11.96%) > Stddev 10 7434.03 ( 0.00%) 5171.91 ( 30.43%) > Stddev 19 8453.84 ( 0.00%) 5309.59 ( 37.19%) > Stddev 28 4184.55 ( 0.00%) 2906.63 ( 30.54%) > Stddev 37 5409.49 ( 0.00%) 3192.12 ( 40.99%) > Stddev 46 4521.95 ( 0.00%) 7392.52 (-63.48%) > Max 1 32738.00 ( 0.00%) 32719.00 ( -0.06%) > Max 10 136039.00 ( 0.00%) 148614.00 ( 9.24%) > Max 19 130566.00 ( 0.00%) 127418.00 ( -2.41%) > Max 28 115404.00 ( 0.00%) 111254.00 ( -3.60%) > Max 37 112118.00 ( 0.00%) 111732.00 ( -0.34%) > Max 46 108541.00 ( 0.00%) 116849.00 ( 7.65%) > TPut 1 123813.00 ( 0.00%) 127671.00 ( 3.12%) > TPut 10 497730.00 ( 0.00%) 563616.00 ( 13.24%) > TPut 19 464134.00 ( 0.00%) 477179.00 ( 2.81%) > TPut 28 433461.00 ( 0.00%) 427518.00 ( -1.37%) > TPut 37 411939.00 ( 0.00%) 427697.00 ( 3.83%) > TPut 46 403133.00 ( 0.00%) 421474.00 ( 4.55%) > > 3.17.0-rc2 3.17.0-rc2 > vanillanodeorder-v1r1 > DMA allocs 0 0 > DMA32 allocs 57 1491992 > Normal allocs 32543566 30026383 > Movable allocs 0 0 > Direct pages scanned 0 0 > Kswapd pages scanned 0 0 > Kswapd pages reclaimed 0 0 > Direct pages reclaimed 0 0 > Kswapd efficiency 100% 100% > Kswapd velocity 0.000 0.000 > Direct efficiency 100% 100% > Direct velocity 0.000 0.000 > Percentage direct scans 0% 0% > Zone normal velocity 0.000 0.000 > Zone dma32 velocity 0.000 0.000 > Zone dma velocity 0.000 0.000 > THP fault alloc 55164 52987 > THP collapse alloc 139 147 > THP splits 26 21 > NUMA alloc hit 4169066 4250692 > NUMA alloc miss 0 0 > > Note that there were more DMA32 allocations with the patch applied. In this > particular case there was no difference in numa_hit and numa_miss. The > expectation is that DMA32 was being used at the low watermark instead of > falling into the slow path. kswapd was not woken but it's not worken for > THP allocations. > > On 32-bit, this patch defaults to zone-ordering as low memory depletion > can be a serious problem on 32-bit large memory machines. If the default > ordering was node then processes on node 0 will deplete the Normal zone > due to normal activity. The problem is worse if CONFIG_HIGHPTE is not > set. If combined with large amounts of dirty/writeback pages in Normal > zone then there is also a high risk of OOM. The heuristics are removed > as it's not clear they were ever important on 32-bit. They were only > relevant for setting node-ordering on 64-bit. > > Signed-off-by: Mel Gorman <mgorman@xxxxxxx> Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>