w.r.t: > 1) direct reclaims occurring quite frequently, resulting in delayed > file read requests > 2) direct reclaims falling into congestion_wait() even though no > congestion at the time, this results in video jitter. Backporting the following changes seemed to greatly improve these issues: -vmscan: synchronous lumpy reclaim should not call congestion_wait() -writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone -vmscan: avoid setting zone congested if no page dirty w.r.t.: > 3) kswapd not reclaiming pages quickly enough due to falling into > congestion_wait() very often. (or stays in congestion_wait() for too long) The attached patch takes an idea from Mel Gorman's patch for "writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone" and applies it around the congestion_wait() in balance_pgdat(). The idea is that if there is no congestion then avoid potentially wait for too long. Comments or alternate solutions would be appreciated. Thanks, Jeff Vanhoof
commit af8ebaca0d367e14b49a151731e8a3e9bc6685f1 Author: Jeff Vanhoof <jdv1029@xxxxxxxxx> Date: Tue Aug 16 00:14:31 2011 -0500 linux-mm: Improve kswapd reclaimation of memory This is a workaround to improve the number of pages reclaimed in kswapd so that direct reclaims and iowaits are minimized. Change-Id: I491e9c80809b5ec3e1e7807742807a2317fc2394 diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index fa79632..e5b6d3d 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -286,6 +286,7 @@ void clear_bdi_congested(struct backing_dev_info *bdi, int sync); void set_bdi_congested(struct backing_dev_info *bdi, int sync); long congestion_wait(int sync, long timeout); long wait_iff_congested(struct zone *zone, int sync, long timeout); +int query_iff_congested(struct zone *zone, int sync); static inline bool bdi_cap_writeback_dirty(struct backing_dev_info *bdi) { diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 4254946..fcf7976 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -850,3 +850,32 @@ out: return ret; } EXPORT_SYMBOL(wait_iff_congested); + +/** + * query_iff_congested - Checks if a backing_dev (any backing_dev) is + * congested or if the given @zone has has experienced recent congestion. + * @zone: A zone to check if it is heavily congested + * @sync: SYNC or ASYNC IO + * + * The return value is 1 if either backing_dev (any) or @zone is congested, + * otherwise 0 is returned. + * + */ +int query_iff_congested(struct zone *zone, int sync) +{ + long ret = 1; + DEFINE_WAIT(wait); + wait_queue_head_t *wqh = &congestion_wqh[sync]; + + /* + * If there is no congestion, or heavy congestion is not being + * encountered in the current zone, set ret to 0 + */ + if (atomic_read(&nr_bdi_congested[sync]) == 0 || + !zone_is_reclaim_congested(zone)) { + ret = 0; + } + + return ret; +} +EXPORT_SYMBOL(query_iff_congested); diff --git a/mm/vmscan.c b/mm/vmscan.c index 1bd01ee..e8686f0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2169,6 +2169,7 @@ loop_again: int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */ unsigned long lru_pages = 0; int has_under_min_watermark_zone = 0; + int any_zone_congested = 0; /* The swap token gets in the way of swapout... */ if (!priority) @@ -2294,6 +2295,13 @@ loop_again: } if (all_zones_ok) break; /* kswapd: all done */ + + /* Check to see if any zones are congested */ + for (i = pgdat->nr_zones - 1; i >= 0; i--) { + struct zone *zone = pgdat->node_zones + i; + any_zone_congested |= + query_iff_congested(zone, BLK_RW_ASYNC); + } + /* * OK, kswapd is getting into trouble. Take a nap, then take * another pass across the zones. @@ -2301,6 +2309,9 @@ loop_again: if (total_scanned && (priority < DEF_PRIORITY - 2)) { if (has_under_min_watermark_zone) count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT); + else if (!any_zone_congested && + (priority > DEF_PRIORITY - 8)) + congestion_wait(BLK_RW_ASYNC, HZ/50); else congestion_wait(BLK_RW_ASYNC, HZ/10); }