On Wed, Jul 16, 2014 at 03:48:10PM +0200, Vlastimil Babka wrote: > When direct sync compaction is often unsuccessful, it may become deferred for > some time to avoid further useless attempts, both sync and async. Successful > high-order allocations un-defer compaction, while further unsuccessful > compaction attempts prolong the copmaction deferred period. > > Currently the checking and setting deferred status is performed only on the > preferred zone of the allocation that invoked direct compaction. But compaction > itself is attempted on all eligible zones in the zonelist, so the behavior is > suboptimal and may lead both to scenarios where 1) compaction is attempted > uselessly, or 2) where it's not attempted despite good chances of succeeding, > as shown on the examples below: > > 1) A direct compaction with Normal preferred zone failed and set deferred > compaction for the Normal zone. Another unrelated direct compaction with > DMA32 as preferred zone will attempt to compact DMA32 zone even though > the first compaction attempt also included DMA32 zone. > > In another scenario, compaction with Normal preferred zone failed to compact > Normal zone, but succeeded in the DMA32 zone, so it will not defer > compaction. In the next attempt, it will try Normal zone which will fail > again, instead of skipping Normal zone and trying DMA32 directly. > > 2) Kswapd will balance DMA32 zone and reset defer status based on watermarks > looking good. A direct compaction with preferred Normal zone will skip > compaction of all zones including DMA32 because Normal was still deferred. > The allocation might have succeeded in DMA32, but won't. > > This patch makes compaction deferring work on individual zone basis instead of > preferred zone. For each zone, it checks compaction_deferred() to decide if the > zone should be skipped. If watermarks fail after compacting the zone, > defer_compaction() is called. The zone where watermarks passed can still be > deferred when the allocation attempt is unsuccessful. When allocation is > successful, compaction_defer_reset() is called for the zone containing the > allocated page. This approach should approximate calling defer_compaction() > only on zones where compaction was attempted and did not yield allocated page. > There might be corner cases but that is inevitable as long as the decision > to stop compacting dues not guarantee that a page will be allocated. > > During testing on a two-node machine with a single very small Normal zone on > node 1, this patch has improved success rates in stress-highalloc mmtests > benchmark. The success here were previously made worse by commit 3a025760fc > ("mm: page_alloc: spill to remote nodes before waking kswapd") as kswapd was > no longer resetting often enough the deferred compaction for the Normal zone, > and DMA32 zones on both nodes were thus not considered for compaction. > On different machine, success rates were improved with __GFP_NO_KSWAPD > allocations. > > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> > Acked-by: Minchan Kim <minchan@xxxxxxxxxx> > Reviewed-by: Zhang Yanfei <zhangyanfei@xxxxxxxxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxx> > Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > Cc: Michal Nazarewicz <mina86@xxxxxxxxxx> > Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> > Cc: Christoph Lameter <cl@xxxxxxxxx> > Cc: Rik van Riel <riel@xxxxxxxxxx> > Cc: David Rientjes <rientjes@xxxxxxxxxx> Acked-by: Mel Gorman <mgorman@xxxxxxx> -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>