Re: issue with direct reclaims and kswapd reclaims on 2.6.35.7

Jeffrey Vanhoof <jdv1029@xxxxxxxxx> · Thu, 18 Aug 2011 01:47:19 -0500

w.r.t:
> 1) direct reclaims occurring quite frequently, resulting in delayed
> file read requests
> 2) direct reclaims falling into congestion_wait() even though no
> congestion at the time, this results in video jitter.

Backporting the following changes seemed to greatly improve these issues:
-vmscan: synchronous lumpy reclaim should not call congestion_wait()
-writeback: do not sleep on the congestion queue if there are no
congested BDIs or if significant congestion is not being encountered
in the current zone
-vmscan: avoid setting zone congested if no page dirty

w.r.t.:
> 3) kswapd not reclaiming pages quickly enough due to falling into
> congestion_wait() very often. (or stays in congestion_wait() for too long)

The attached patch takes an idea from Mel Gorman's patch for
"writeback: do not sleep on the congestion queue if there are no
congested BDIs or if significant congestion is not being encountered
in the current zone" and applies it around the congestion_wait() in
balance_pgdat(). The idea is that if there is no congestion then avoid
potentially wait for too long. Comments or alternate solutions would
be appreciated.

Thanks,
Jeff Vanhoof
commit af8ebaca0d367e14b49a151731e8a3e9bc6685f1
Author: Jeff Vanhoof <jdv1029@xxxxxxxxx>
Date:   Tue Aug 16 00:14:31 2011 -0500

    linux-mm: Improve kswapd reclaimation of memory
    
    This is a workaround to improve the number of pages reclaimed
    in kswapd so that direct reclaims and iowaits are minimized.
    
    Change-Id: I491e9c80809b5ec3e1e7807742807a2317fc2394

diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index fa79632..e5b6d3d 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -286,6 +286,7 @@ void clear_bdi_congested(struct backing_dev_info *bdi, int sync);
 void set_bdi_congested(struct backing_dev_info *bdi, int sync);
 long congestion_wait(int sync, long timeout);
 long wait_iff_congested(struct zone *zone, int sync, long timeout);
+int query_iff_congested(struct zone *zone, int sync);
 
 static inline bool bdi_cap_writeback_dirty(struct backing_dev_info *bdi)
 {
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 4254946..fcf7976 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -850,3 +850,32 @@ out:
 	return ret;
 }
 EXPORT_SYMBOL(wait_iff_congested);
+
+/**
+ * query_iff_congested - Checks if a backing_dev (any backing_dev) is
+ *     congested or if the given @zone has has experienced recent congestion.
+ * @zone: A zone to check if it is heavily congested
+ * @sync: SYNC or ASYNC IO
+ *
+ * The return value is 1 if either backing_dev (any) or @zone is congested,
+ * otherwise 0 is returned.
+ *
+ */
+int query_iff_congested(struct zone *zone, int sync)
+{
+	long ret = 1;
+	DEFINE_WAIT(wait);
+	wait_queue_head_t *wqh = &congestion_wqh[sync];
+
+	/*
+	 * If there is no congestion, or heavy congestion is not being
+	 * encountered in the current zone, set ret to 0
+	 */
+	if (atomic_read(&nr_bdi_congested[sync]) == 0 ||
+			!zone_is_reclaim_congested(zone)) {
+		ret = 0;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(query_iff_congested);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1bd01ee..e8686f0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2169,6 +2169,7 @@ loop_again:
 		int end_zone = 0;	/* Inclusive.  0 = ZONE_DMA */
 		unsigned long lru_pages = 0;
 		int has_under_min_watermark_zone = 0;
+		int any_zone_congested = 0;
 
 		/* The swap token gets in the way of swapout... */
 		if (!priority)
@@ -2294,6 +2295,13 @@ loop_again:
 		}
 		if (all_zones_ok)
 			break;		/* kswapd: all done */
+
+		/* Check to see if any zones are congested */
+		for (i = pgdat->nr_zones - 1; i >= 0; i--) {
+			struct zone *zone = pgdat->node_zones + i;
+			any_zone_congested |=
+				 query_iff_congested(zone, BLK_RW_ASYNC);
+		}
+
 		/*
 		 * OK, kswapd is getting into trouble.  Take a nap, then take
 		 * another pass across the zones.
@@ -2301,6 +2309,9 @@ loop_again:
 		if (total_scanned && (priority < DEF_PRIORITY - 2)) {
 			if (has_under_min_watermark_zone)
 				count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT);
+			else if (!any_zone_congested &&
+				 (priority > DEF_PRIORITY - 8))
+				congestion_wait(BLK_RW_ASYNC, HZ/50);
 			else
 				congestion_wait(BLK_RW_ASYNC, HZ/10);
 		}