The goal is to reduce latency (and increase success) of direct async compaction by making it focus more on the goal of creating a high-order page, at the expense of thoroughness. This should be useful for example for THP allocations where we still get reports of being too expensive, most recently [2]. This is based on an older attempt [1] which I didn't finish as it seemed that it increased longer-term fragmentation. Now it seems it doesn't, but I'll have to test more properly. This patch (2) makes migration scanner skip whole order-aligned blocks where isolation fails, as it takes just one unmigrated page to prevent a high-order page from merging. Patch 3 tries to reduce the excessive freepage scanning (such as in [3]) by allocating migration targets from freelist. We just need to be sure that the pages are not from the same block as the migrated pages. This is also limited to direct async compaction and is not meant to replace a (potentially redesigned) free scanner for other scenarios. Early tests with stress-highalloc configured to simulate THP allocations: 4.4-rc2 4.4-rc2 4.4-rc2 4.4-rc2 0-test 1-test 2-test 3-test Success 1 Min 1.00 ( 0.00%) 2.00 (-100.00%) 2.00 (-100.00%) 3.00 (-200.00%) Success 1 Mean 3.00 ( 0.00%) 3.00 ( 0.00%) 2.80 ( 6.67%) 4.80 (-60.00%) Success 1 Max 6.00 ( 0.00%) 4.00 ( 33.33%) 5.00 ( 16.67%) 7.00 (-16.67%) Success 2 Min 1.00 ( 0.00%) 3.00 (-200.00%) 4.00 (-300.00%) 8.00 (-700.00%) Success 2 Mean 3.80 ( 0.00%) 4.00 ( -5.26%) 5.20 (-36.84%) 11.00 (-189.47%) Success 2 Max 8.00 ( 0.00%) 7.00 ( 12.50%) 6.00 ( 25.00%) 13.00 (-62.50%) Success 3 Min 58.00 ( 0.00%) 69.00 (-18.97%) 53.00 ( 8.62%) 66.00 (-13.79%) Success 3 Mean 67.40 ( 0.00%) 74.00 ( -9.79%) 58.20 ( 13.65%) 68.80 ( -2.08%) Success 3 Max 74.00 ( 0.00%) 78.00 ( -5.41%) 70.00 ( 5.41%) 72.00 ( 2.70%) 4.4-rc2 4.4-rc2 4.4-rc2 4.4-rc2 0-test 1-test 2-test 3-test User 3167.23 3140.58 3198.77 3049.85 System 1166.65 1158.64 1171.06 1140.18 Elapsed 1827.63 1737.69 1750.62 1793.82 4.4-rc2 4.4-rc2 4.4-rc2 4.4-rc2 0-test 1-test 2-test 3-test Minor Faults 107184766 107311664 107366319 108425875 Major Faults 753 730 746 817 Swap Ins 188 346 243 287 Swap Outs 7278 6186 6226 5702 Allocation stalls 988 868 1104 846 DMA allocs 25 18 15 13 DMA32 allocs 75074785 75104070 75131502 76260816 Normal allocs 26112454 26193770 26142374 26291337 Movable allocs 0 0 0 0 Direct pages scanned 83996 82251 80523 93509 Kswapd pages scanned 2122511 2107947 2110599 2121951 Kswapd pages reclaimed 2031597 2006468 2011184 2052483 Direct pages reclaimed 83806 82162 80315 93275 Kswapd efficiency 95% 95% 95% 96% Kswapd velocity 1217.211 1202.789 1211.116 1189.075 Direct efficiency 99% 99% 99% 99% Direct velocity 48.170 46.932 46.206 52.400 Percentage direct scans 3% 3% 3% 4% Zone normal velocity 301.196 301.273 297.286 308.598 Zone dma32 velocity 964.185 948.448 960.036 932.877 Zone dma velocity 0.000 0.000 0.000 0.000 Page writes by reclaim 7296.200 6187.400 6226.800 5702.600 Page writes file 18 1 0 0 Page writes anon 7278 6186 6226 5702 Page reclaim immediate 259 225 41 180 Sector Reads 4132945 4074422 4099737 4291996 Sector Writes 11066128 11057103 11066448 11083256 Page rescued immediate 0 0 0 0 Slabs scanned 1539471 1521153 1518145 1776426 Direct inode steals 8482 3717 6096 9832 Kswapd inode steals 37735 42700 39976 43492 Kswapd skipped wait 0 0 0 0 THP fault alloc 593 610 680 778 THP collapse alloc 340 294 335 393 THP splits 4 2 4 3 THP fault fallback 751 748 705 626 THP collapse fail 14 16 14 12 Compaction stalls 6464 6373 6743 6451 Compaction success 518 688 575 972 Compaction failures 5945 5684 6167 5479 Page migrate success 318176 313488 239637 595224 Page migrate failure 40983 46106 12171 2587 Compaction pages isolated 733684 735737 564719 713799 Compaction migrate scanned 1101427 1056870 603977 969346 Compaction free scanned 17736383 15328486 11999748 5269641 Compaction cost 352 347 263 638 NUMA alloc hit 99632716 99690283 99753018 100771746 NUMA alloc miss 0 0 0 0 NUMA interleave hit 0 0 0 0 NUMA alloc local 99632716 99690283 99753018 100771746 NUMA base PTE updates 0 0 0 0 NUMA huge PMD updates 0 0 0 0 NUMA page range updates 0 0 0 0 NUMA hint faults 0 0 0 0 NUMA hint local faults 0 0 0 0 NUMA hint local percent 100 100 100 100 NUMA pages migrated 0 0 0 0 AutoNUMA cost 0% 0% 0% 0% Migrate scanned pages are reduced by patch 2 as expected thanks to the skipping. Patch 3 reduces free scanned pages significantly, and improves compaction success and THP fault allocs (of the interfering activity, not the alloc test itself). That results in more migrate scanner activity, as more success means less deferring, and time spent previously in free sacanner can now be used in migration scanner. "Success 3" is indication of long-term fragmentation (the interference is ceased in this phase) and it looks quite unstable overall (there shouldn't be such difference between base and patch 1) but it doesn't seem decreased. I'm suspecting it's the lack of reset_isolation_suitable() when the only activity is async compaction. Needs more evaluation. Aaron, could you try this on your testcase? [1] https://lkml.org/lkml/2014/7/16/988 [2] http://www.spinics.net/lists/linux-mm/msg97378.html [3] http://www.spinics.net/lists/linux-mm/msg97475.html Vlastimil Babka (3): mm, compaction: reduce spurious pcplist drains mm, compaction: make async direct compaction skip blocks where isolation fails mm, compaction: direct freepage allocation for async direct compaction include/linux/vm_event_item.h | 1 + mm/compaction.c | 122 +++++++++++++++++++++++++++++++++++------- mm/internal.h | 4 ++ mm/page_alloc.c | 27 ++++++++++ mm/vmstat.c | 2 + 5 files changed, 137 insertions(+), 19 deletions(-) -- 2.6.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>