Re: [PATCH stable-3.12+] Don't trigger congestion wait on dirty-but-not-writeout pages

Luis Henriques <luis.henriques@xxxxxxxxxxxxx> · Tue, 22 Jul 2014 17:33:51 +0100

On Mon, Jul 21, 2014 at 05:54:11PM +0200, Michal Hocko wrote:
> From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> 
> commit b738d764652dc5aab1c8939f637112981fce9e0e upstream
> 
> shrink_inactive_list() used to wait 0.1s to avoid congestion when all
> the pages that were isolated from the inactive list were dirty but not
> under active writeback.  That makes no real sense, and apparently causes
> major interactivity issues under some loads since 3.11.
> 
> The ostensible reason for it was to wait for kswapd to start writing
> pages, but that seems questionable as well, since the congestion wait
> code seems to trigger for kswapd itself as well.  Also, the logic behind
> delaying anything when we haven't actually started writeback is not
> clear - it only delays actually starting that writeback.
> 
> We'll still trigger the congestion waiting if
> 
>  (a) the process is kswapd, and we hit pages flagged for immediate
>      reclaim
> 
>  (b) the process is not kswapd, and the zone backing dev writeback is
>      actually congested.
> 
> This probably needs to be revisited, but as it is this fixes a reported
> regression.
> 
> [mhocko@xxxxxxx: backport to 3.12 stable tree]
> Fixes: e2be15f6c3ee ('mm: vmscan: stall page reclaim and writeback pages based on dirty/writepage pages encountered')

This seems to be applicable to the 3.11 kernel as well.  If there are
no objections, I'll queue it.

Cheers,
--
Luís

> Reported-by: Felipe Contreras <felipe.contreras@xxxxxxxxx>
> Pinpointed-by: Hillf Danton <dhillf@xxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Mel Gorman <mgorman@xxxxxxx>
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Michal Hocko <mhocko@xxxxxxx>
> ---
>  mm/vmscan.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1d891f49587b..5ad29b2925a0 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1522,19 +1522,18 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>  		 * If dirty pages are scanned that are not queued for IO, it
>  		 * implies that flushers are not keeping up. In this case, flag
>  		 * the zone ZONE_TAIL_LRU_DIRTY and kswapd will start writing
> -		 * pages from reclaim context. It will forcibly stall in the
> -		 * next check.
> +		 * pages from reclaim context.
>  		 */
>  		if (nr_unqueued_dirty == nr_taken)
>  			zone_set_flag(zone, ZONE_TAIL_LRU_DIRTY);
>  
>  		/*
> -		 * In addition, if kswapd scans pages marked marked for
> -		 * immediate reclaim and under writeback (nr_immediate), it
> -		 * implies that pages are cycling through the LRU faster than
> +		 * If kswapd scans pages marked marked for immediate
> +		 * reclaim and under writeback (nr_immediate), it implies
> +		 * that pages are cycling through the LRU faster than
>  		 * they are written so also forcibly stall.
>  		 */
> -		if (nr_unqueued_dirty == nr_taken || nr_immediate)
> +		if (nr_immediate)
>  			congestion_wait(BLK_RW_ASYNC, HZ/10);
>  	}
>  
> -- 
> 2.0.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html