+ mm-fix-draining-remote-pageset.patch added to mm-unstable branch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm: fix draining remote pageset
has been added to the -mm mm-unstable branch.  Its filename is
     mm-fix-draining-remote-pageset.patch

This patch will shortly appear at
     https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-fix-draining-remote-pageset.patch

This patch will later appear in the mm-unstable branch at
    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days

------------------------------------------------------
From: Huang Ying <ying.huang@xxxxxxxxx>
Subject: mm: fix draining remote pageset
Date: Fri, 11 Aug 2023 17:08:19 +0800

If there is no memory allocation/freeing in the remote pageset after
some time (3 seconds for now), the remote pageset will be drained to
avoid memory wastage.

But in the current implementation, vmstat updater worker may not be
re-queued when we are waiting for the timeout (pcp->expire != 0) if
there are no vmstat changes, for example, when CPU goes idle.

This is fixed via guaranteeing that the vmstat updater worker will
always be re-queued when we are waiting for the timeout.

We can reproduce the bug via allocating/freeing pages from remote
node, then go idle.  And the patch can fix it.

- Run some workloads, use `numactl` to bind CPU to node 0 and memory to
  node 1.  So the PCP of the CPU on node 0 for zone on node 1 will be
  filled.

- After workloads finish, idle for 60s

- Check /proc/zoneinfo

With the original kernel, the number of pages in the PCP of the CPU on
node 0 for zone on node 1 is non-zero after idle.  With the patched
kernel, that becomes 0 after idle.  We avoid to keep pages in the remote
PCP during idle.

Christoph added:

: Having some pages from a remote NUMA node stuck in a pcp somewhere is 
: making that memory unusable. It is usually rate that these remote pages 
: are needed again and so they may remain there for a long time if the 
: situation is right.

Link: https://lkml.kernel.org/r/20230811090819.60845-1-ying.huang@xxxxxxxxx
Fixes: 7cc36bbddde5 ("vmstat: on-demand vmstat workers V8")
Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
Reviewed-by: Christoph Lameter <cl@xxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Cc: Vlastimil Babka <vbabka@xxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/vmstat.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/mm/vmstat.c~mm-fix-draining-remote-pageset
+++ a/mm/vmstat.c
@@ -855,8 +855,10 @@ static int refresh_cpu_vm_stats(bool do_
 				continue;
 			}
 
-			if (__this_cpu_dec_return(pcp->expire))
+			if (__this_cpu_dec_return(pcp->expire)) {
+				changes++;
 				continue;
+			}
 
 			if (__this_cpu_read(pcp->count)) {
 				drain_zone_pages(zone, this_cpu_ptr(pcp));
_

Patches currently in -mm which might be from ying.huang@xxxxxxxxx are

memory-tiering-add-abstract-distance-calculation-algorithms-management.patch
acpi-hmat-refactor-hmat_register_target_initiators.patch
acpi-hmat-calculate-abstract-distance-with-hmat.patch
dax-kmem-calculate-abstract-distance-with-general-interface.patch
mm-fix-draining-remote-pageset.patch




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux