+ mm-memcontrol-fix-numa-round-robin-reclaim-at-intermediate-level.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm: memcontrol: fix NUMA round-robin reclaim at intermediate level
has been added to the -mm tree.  Its filename is
     mm-memcontrol-fix-numa-round-robin-reclaim-at-intermediate-level.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-fix-numa-round-robin-reclaim-at-intermediate-level.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-fix-numa-round-robin-reclaim-at-intermediate-level.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@xxxxxxxxxxx>
Subject: mm: memcontrol: fix NUMA round-robin reclaim at intermediate level

When a cgroup is reclaimed on behalf of a configured limit, reclaim needs
to round-robin through all NUMA nodes that hold pages of the memcg in
question.  However, when assembling the mask of candidate NUMA nodes, the
code only consults the *local* cgroup LRU counters, not the recursive
counters for the entire subtree.  Cgroup limits are frequently configured
against intermediate cgroups that do not have memory on their own LRUs. 
In this case, the node mask will always come up empty and reclaim falls
back to scanning only the current node.

If a cgroup subtree has some memory on one node but the processes are
bound to another node afterwards, the limit reclaim will never age or
reclaim that memory anymore.

To fix this, use the recursive LRU counts for a cgroup subtree to
determine which nodes hold memory of that cgroup.

The code has been broken like this forever, so it doesn't seem to be a
problem in practice.  I just noticed it while reviewing the way the LRU
counters are used in general.

Link: http://lkml.kernel.org/r/20190412151507.2769-5-hannes@xxxxxxxxxxx
Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
Reviewed-by: Shakeel Butt <shakeelb@xxxxxxxxxx>
Reviewed-by: Roman Gushchin <guro@xxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/memcontrol.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/mm/memcontrol.c~mm-memcontrol-fix-numa-round-robin-reclaim-at-intermediate-level
+++ a/mm/memcontrol.c
@@ -1512,13 +1512,13 @@ static bool test_mem_cgroup_node_reclaim
 {
 	struct lruvec *lruvec = mem_cgroup_lruvec(NODE_DATA(nid), memcg);
 
-	if (lruvec_page_state_local(lruvec, NR_INACTIVE_FILE) ||
-	    lruvec_page_state_local(lruvec, NR_ACTIVE_FILE))
+	if (lruvec_page_state(lruvec, NR_INACTIVE_FILE) ||
+	    lruvec_page_state(lruvec, NR_ACTIVE_FILE))
 		return true;
 	if (noswap || !total_swap_pages)
 		return false;
-	if (lruvec_page_state_local(lruvec, NR_INACTIVE_ANON) ||
-	    lruvec_page_state_local(lruvec, NR_ACTIVE_ANON))
+	if (lruvec_page_state(lruvec, NR_INACTIVE_ANON) ||
+	    lruvec_page_state(lruvec, NR_ACTIVE_ANON))
 		return true;
 	return false;
 
_

Patches currently in -mm which might be from hannes@xxxxxxxxxxx are

mm-fix-inactive-list-balancing-between-numa-nodes-and-cgroups.patch
mm-memcontrol-track-lru-counts-in-the-vmstats-array.patch
mm-memcontrol-replace-zone-summing-with-lruvec_page_state.patch
mm-memcontrol-replace-node-summing-with-memcg_page_state.patch
mm-memcontrol-push-down-mem_cgroup_node_nr_lru_pages.patch
mm-memcontrol-push-down-mem_cgroup_nr_lru_pages.patch
mm-memcontrol-quarantine-the-mem_cgroup_nr_lru_pages-api.patch
mm-fix-false-positive-overcommit_guess-failures.patch
mm-memcontrol-make-cgroup-stats-and-events-query-api-explicitly-local.patch
mm-memcontrol-move-stat-event-counting-functions-out-of-line.patch
mm-memcontrol-fix-recursive-statistics-correctness-scalabilty.patch
mm-memcontrol-fix-numa-round-robin-reclaim-at-intermediate-level.patch




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux