The patch titled Subject: mm: fix draining remote pageset has been added to the -mm mm-unstable branch. Its filename is mm-fix-draining-remote-pageset.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-fix-draining-remote-pageset.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Huang Ying <ying.huang@xxxxxxxxx> Subject: mm: fix draining remote pageset Date: Fri, 11 Aug 2023 17:08:19 +0800 If there is no memory allocation/freeing in the remote pageset after some time (3 seconds for now), the remote pageset will be drained to avoid memory wastage. But in the current implementation, vmstat updater worker may not be re-queued when we are waiting for the timeout (pcp->expire != 0) if there are no vmstat changes, for example, when CPU goes idle. This is fixed via guaranteeing that the vmstat updater worker will always be re-queued when we are waiting for the timeout. We can reproduce the bug via allocating/freeing pages from remote node, then go idle. And the patch can fix it. - Run some workloads, use `numactl` to bind CPU to node 0 and memory to node 1. So the PCP of the CPU on node 0 for zone on node 1 will be filled. - After workloads finish, idle for 60s - Check /proc/zoneinfo With the original kernel, the number of pages in the PCP of the CPU on node 0 for zone on node 1 is non-zero after idle. With the patched kernel, that becomes 0 after idle. We avoid to keep pages in the remote PCP during idle. Christoph added: : Having some pages from a remote NUMA node stuck in a pcp somewhere is : making that memory unusable. It is usually rate that these remote pages : are needed again and so they may remain there for a long time if the : situation is right. Link: https://lkml.kernel.org/r/20230811090819.60845-1-ying.huang@xxxxxxxxx Fixes: 7cc36bbddde5 ("vmstat: on-demand vmstat workers V8") Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx> Reviewed-by: Christoph Lameter <cl@xxxxxxxxx> Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> Cc: Vlastimil Babka <vbabka@xxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/vmstat.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/mm/vmstat.c~mm-fix-draining-remote-pageset +++ a/mm/vmstat.c @@ -855,8 +855,10 @@ static int refresh_cpu_vm_stats(bool do_ continue; } - if (__this_cpu_dec_return(pcp->expire)) + if (__this_cpu_dec_return(pcp->expire)) { + changes++; continue; + } if (__this_cpu_read(pcp->count)) { drain_zone_pages(zone, this_cpu_ptr(pcp)); _ Patches currently in -mm which might be from ying.huang@xxxxxxxxx are memory-tiering-add-abstract-distance-calculation-algorithms-management.patch acpi-hmat-refactor-hmat_register_target_initiators.patch acpi-hmat-calculate-abstract-distance-with-hmat.patch dax-kmem-calculate-abstract-distance-with-general-interface.patch mm-fix-draining-remote-pageset.patch