+ mm-vmscan-do-not-iterate-all-mem-cgroups-for-global-direct-reclaim.patch added to -mm tree

akpm@xxxxxxxxxxxxxxxxxxxx · Wed, 30 Jan 2019 14:15:02 -0800

The patch titled
     Subject: mm: vmscan: do not iterate all mem cgroups for global direct reclaim
has been added to the -mm tree.  Its filename is
     mm-vmscan-do-not-iterate-all-mem-cgroups-for-global-direct-reclaim.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-vmscan-do-not-iterate-all-mem-cgroups-for-global-direct-reclaim.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-vmscan-do-not-iterate-all-mem-cgroups-for-global-direct-reclaim.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx>
Subject: mm: vmscan: do not iterate all mem cgroups for global direct reclaim

In current implementation, both kswapd and direct reclaim has to iterate
all mem cgroups.  It is not a problem before offline mem cgroups could be
iterated.  But, currently with iterating offline mem cgroups, it could be
very time consuming.  In our workloads, we saw over 400K mem cgroups
accumulated in some cases, only a few hundred are online memcgs.  Although
kswapd could help out to reduce the number of memcgs, direct reclaim still
get hit with iterating a number of offline memcgs in some cases.  We
experienced the responsiveness problems due to this occassionally.

A simple test with pref shows it may take around 220ms to iterate 8K memcgs
in direct reclaim:
             dd 13873 [011]   578.542919: vmscan:mm_vmscan_direct_reclaim_begin
             dd 13873 [011]   578.758689: vmscan:mm_vmscan_direct_reclaim_end
So for 400K, it may take around 11 seconds to iterate all memcgs.

Here just break the iteration once it reclaims enough pages as what
memcg direct reclaim does.  This may hurt the fairness among memcgs.  But
the cached iterator cookie could help to achieve the fairness more or
less.

Link: http://lkml.kernel.org/r/1548799877-10949-1-git-send-email-yang.shi@xxxxxxxxxxxxxxxxx
Signed-off-by: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx>
Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/vmscan.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/mm/vmscan.c~mm-vmscan-do-not-iterate-all-mem-cgroups-for-global-direct-reclaim
+++ a/mm/vmscan.c
@@ -2839,16 +2839,15 @@ static bool shrink_node(pg_data_t *pgdat
 				   sc->nr_reclaimed - reclaimed);
 
 			/*
-			 * Direct reclaim and kswapd have to scan all memory
-			 * cgroups to fulfill the overall scan target for the
-			 * node.
+			 * Kswapd have to scan all memory cgroups to fulfill
+			 * the overall scan target for the node.
 			 *
 			 * Limit reclaim, on the other hand, only cares about
 			 * nr_to_reclaim pages to be reclaimed and it will
 			 * retry with decreasing priority if one round over the
 			 * whole hierarchy is not sufficient.
 			 */
-			if (!global_reclaim(sc) &&
+			if (!current_is_kswapd() &&
 					sc->nr_reclaimed >= sc->nr_to_reclaim) {
 				mem_cgroup_iter_break(root, memcg);
 				break;
_

Patches currently in -mm which might be from yang.shi@xxxxxxxxxxxxxxxxx are

mm-swap-check-if-swap-backing-device-is-congested-or-not.patch
mm-swap-check-if-swap-backing-device-is-congested-or-not-fix-2.patch
mm-swap-add-comment-for-swap_vma_readahead.patch
mm-swap-use-mem_cgroup_is_root-instead-of-deferencing-css-parent.patch
mm-vmscan-do-not-iterate-all-mem-cgroups-for-global-direct-reclaim.patch