[merged mm-stable] mm-fix-draining-remote-pageset.patch removed from -mm tree

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Wed, 25 Oct 2023 16:47:39 -0700

The quilt patch titled
     Subject: mm: fix draining remote pageset
has been removed from the -mm tree.  Its filename was
     mm-fix-draining-remote-pageset.patch

This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

------------------------------------------------------
From: Huang Ying <ying.huang@xxxxxxxxx>
Subject: mm: fix draining remote pageset
Date: Fri, 11 Aug 2023 17:08:19 +0800

If there is no memory allocation/freeing in the PCP (Per-CPU Pageset) of a
remote zone (zone in remote NUMA node) after some time (3 seconds for
now), the pages of the PCP of the remote zone will be drained to avoid
memory wastage.

This behavior was introduced in the commit 4ae7c03943fc ("[PATCH]
Periodically drain non local pagesets") and the commit 4037d452202e ("Move
remote node draining out of slab allocators")

But, after the commit 7cc36bbddde5 ("vmstat: on-demand vmstat workers
V8"), the vmstat updater worker which is used to drain the PCP of remote
zones may not be re-queued when we are waiting for the timeout
(pcp->expire != 0) if there are no vmstat changes on this CPU, for
example, when the CPU goes idle or runs user space only workloads.  This
may cause the pages of a remote zone be kept in PCP of this CPU for long
time.  So that, the page reclaiming of the remote zone may be triggered
prematurely.  This isn't a severe problem in practice, because the PCP of
the remote zone will be drained if some memory are allocated/freed again
on this CPU.  And, the PCP will eventually be drained during the direct
reclaiming if necessary.

Anyway, the problem still deserves a fix via guaranteeing that the vmstat
updater worker will always be re-queued when we are waiting for the
timeout.  In effect, this restores the original behavior before the commit
7cc36bbddde5.

We can reproduce the bug via allocating/freeing pages from a remote zone
then go idle as follows.  And the patch can fix it.

- Run some workloads, use `numactl` to bind CPU to node 0 and memory to
  node 1.  So the PCP of the CPU on node 0 for zone on node 1 will be
  filled.

- After workloads finish, idle for 60s

- Check /proc/zoneinfo

With the original kernel, the number of pages in the PCP of the CPU on
node 0 for zone on node 1 is non-zero after idle.  With the patched
kernel, it becomes 0 after idle.  That is, we avoid to keep pages in the
remote PCP during idle.

Link: https://lkml.kernel.org/r/20231007062356.187621-1-ying.huang@xxxxxxxxx
Link: https://lkml.kernel.org/r/20230811090819.60845-1-ying.huang@xxxxxxxxx
Fixes: 7cc36bbddde5 ("vmstat: on-demand vmstat workers V8")
Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
Reviewed-by: Christoph Lameter <cl@xxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Cc: Vlastimil Babka <vbabka@xxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/vmstat.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/mm/vmstat.c~mm-fix-draining-remote-pageset
+++ a/mm/vmstat.c
@@ -857,8 +857,10 @@ static int refresh_cpu_vm_stats(bool do_
 				continue;
 			}
 
-			if (__this_cpu_dec_return(pcp->expire))
+			if (__this_cpu_dec_return(pcp->expire)) {
+				changes++;
 				continue;
+			}
 
 			if (__this_cpu_read(pcp->count)) {
 				drain_zone_pages(zone, this_cpu_ptr(pcp));
_

Patches currently in -mm which might be from ying.huang@xxxxxxxxx are