From: Zlatko Calusic <zlatko.calusic@xxxxxxxx> Currently we take a short nap (HZ/10) and wait for congestion to clear before taking another pass with lower priority in balance_pgdat(). But we do that only for the highest zone that we encounter is unbalanced and congested. This patch changes that to wait on all congested zones in a single pass in the hope that it will save us some scanning that way. Also we take a nap as soon as congested zone is encountered and sc.priority < DEF_PRIORITY - 2 (aka kswapd in trouble). Cc: Mel Gorman <mgorman@xxxxxxx> Cc: Hugh Dickins <hughd@xxxxxxxxxx> Cc: Minchan Kim <minchan.kim@xxxxxxxxx> Signed-off-by: Zlatko Calusic <zlatko.calusic@xxxxxxxx> --- The patch is against the mm tree. Make sure that mm-avoid-calling-pgdat_balanced-needlessly.patch is applied first (not yet in the mmotm tree). Tested on half a dozen systems with different workloads for the last few days, working really well! mm/vmscan.c | 35 ++++++++++++----------------------- 1 file changed, 12 insertions(+), 23 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 002ade6..1c5d38a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2565,7 +2565,6 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, int *classzone_idx) { bool pgdat_is_balanced = false; - struct zone *unbalanced_zone; int i; int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */ unsigned long total_scanned; @@ -2596,9 +2595,6 @@ loop_again: do { unsigned long lru_pages = 0; - int has_under_min_watermark_zone = 0; - - unbalanced_zone = NULL; /* * Scan in the highmem->dma direction for the highest @@ -2739,15 +2735,20 @@ loop_again: } if (!zone_balanced(zone, testorder, 0, end_zone)) { - unbalanced_zone = zone; - /* - * We are still under min water mark. This - * means that we have a GFP_ATOMIC allocation - * failure risk. Hurry up! - */ + if (total_scanned && sc.priority < DEF_PRIORITY - 2) { + /* OK, kswapd is getting into trouble. */ if (!zone_watermark_ok_safe(zone, order, min_wmark_pages(zone), end_zone, 0)) - has_under_min_watermark_zone = 1; + /* + * We are still under min water mark. + * This means that we have a GFP_ATOMIC + * allocation failure risk. Hurry up! + */ + count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT); + else + /* Take a nap if a zone is congested. */ + wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10); + } } else { /* * If a zone reaches its high watermark, @@ -2758,7 +2759,6 @@ loop_again: */ zone_clear_flag(zone, ZONE_CONGESTED); } - } /* @@ -2776,17 +2776,6 @@ loop_again: } /* - * OK, kswapd is getting into trouble. Take a nap, then take - * another pass across the zones. - */ - if (total_scanned && (sc.priority < DEF_PRIORITY - 2)) { - if (has_under_min_watermark_zone) - count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT); - else if (unbalanced_zone) - wait_iff_congested(unbalanced_zone, BLK_RW_ASYNC, HZ/10); - } - - /* * We do this so kswapd doesn't build up large priorities for * example when it is freeing in parallel with allocators. It * matches the direct reclaim path behaviour in terms of impact -- 1.8.1 -- Zlatko -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>