On Mon 13-05-13 09:12:33, Mel Gorman wrote: > Simplistically, the anon and file LRU lists are scanned proportionally > depending on the value of vm.swappiness although there are other factors > taken into account by get_scan_count(). The patch "mm: vmscan: Limit > the number of pages kswapd reclaims" limits the number of pages kswapd > reclaims but it breaks this proportional scanning and may evenly shrink > anon/file LRUs regardless of vm.swappiness. > > This patch preserves the proportional scanning and reclaim. It does mean > that kswapd will reclaim more than requested but the number of pages will > be related to the high watermark. > > [mhocko@xxxxxxx: Correct proportional reclaim for memcg and simplify] > [kamezawa.hiroyu@xxxxxxxxxxxxxx: Recalculate scan based on target] > [hannes@xxxxxxxxxxx: Account for already scanned pages properly] > Signed-off-by: Mel Gorman <mgorman@xxxxxxx> > Acked-by: Rik van Riel <riel@xxxxxxxxxx> active vs. inactive might get skewed a bit AFAICS because both of them are zeroed but file vs. anon should be scanned proportionally based on swappiness now which sounds like it is good enough. Reviewed-by: Michal Hocko <mhocko@xxxxxxx> > --- > mm/vmscan.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 59 insertions(+), 8 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index cdbc069..26ad67f 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1822,17 +1822,25 @@ out: > static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > { > unsigned long nr[NR_LRU_LISTS]; > + unsigned long targets[NR_LRU_LISTS]; > unsigned long nr_to_scan; > enum lru_list lru; > unsigned long nr_reclaimed = 0; > unsigned long nr_to_reclaim = sc->nr_to_reclaim; > struct blk_plug plug; > + bool scan_adjusted = false; > > get_scan_count(lruvec, sc, nr); > > + /* Record the original scan target for proportional adjustments later */ > + memcpy(targets, nr, sizeof(nr)); > + > blk_start_plug(&plug); > while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || > nr[LRU_INACTIVE_FILE]) { > + unsigned long nr_anon, nr_file, percentage; > + unsigned long nr_scanned; > + > for_each_evictable_lru(lru) { > if (nr[lru]) { > nr_to_scan = min(nr[lru], SWAP_CLUSTER_MAX); > @@ -1842,17 +1850,60 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > lruvec, sc); > } > } > + > + if (nr_reclaimed < nr_to_reclaim || scan_adjusted) > + continue; > + > /* > - * On large memory systems, scan >> priority can become > - * really large. This is fine for the starting priority; > - * we want to put equal scanning pressure on each zone. > - * However, if the VM has a harder time of freeing pages, > - * with multiple processes reclaiming pages, the total > - * freeing target can get unreasonably large. > + * For global direct reclaim, reclaim only the number of pages > + * requested. Less care is taken to scan proportionally as it > + * is more important to minimise direct reclaim stall latency > + * than it is to properly age the LRU lists. > */ > - if (nr_reclaimed >= nr_to_reclaim && > - sc->priority < DEF_PRIORITY) > + if (global_reclaim(sc) && !current_is_kswapd()) > break; > + > + /* > + * For kswapd and memcg, reclaim at least the number of pages > + * requested. Ensure that the anon and file LRUs shrink > + * proportionally what was requested by get_scan_count(). We > + * stop reclaiming one LRU and reduce the amount scanning > + * proportional to the original scan target. > + */ > + nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE]; > + nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON]; > + > + if (nr_file > nr_anon) { > + unsigned long scan_target = targets[LRU_INACTIVE_ANON] + > + targets[LRU_ACTIVE_ANON] + 1; > + lru = LRU_BASE; > + percentage = nr_anon * 100 / scan_target; > + } else { > + unsigned long scan_target = targets[LRU_INACTIVE_FILE] + > + targets[LRU_ACTIVE_FILE] + 1; > + lru = LRU_FILE; > + percentage = nr_file * 100 / scan_target; > + } > + > + /* Stop scanning the smaller of the LRU */ > + nr[lru] = 0; > + nr[lru + LRU_ACTIVE] = 0; > + > + /* > + * Recalculate the other LRU scan count based on its original > + * scan target and the percentage scanning already complete > + */ > + lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE; > + nr_scanned = targets[lru] - nr[lru]; > + nr[lru] = targets[lru] * (100 - percentage) / 100; > + nr[lru] -= min(nr[lru], nr_scanned); > + > + lru += LRU_ACTIVE; > + nr_scanned = targets[lru] - nr[lru]; > + nr[lru] = targets[lru] * (100 - percentage) / 100; > + nr[lru] -= min(nr[lru], nr_scanned); > + > + scan_adjusted = true; > } > blk_finish_plug(&plug); > sc->nr_reclaimed += nr_reclaimed; > -- > 1.8.1.4 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>