Hi Andrew, I want to request to drop this patch from the mm-tree. At the moment we have no evidence that it is solving a real problem but we know that it will cause unexpected charging of the xarray nodes. thanks, Shakeel On Tue, Aug 20, 2024 at 05:48:15PM GMT, Andrew Morton wrote: > > The patch titled > Subject: mm/swap, workingset: make anon shadow nodes memcg aware > has been added to the -mm mm-unstable branch. Its filename is > mm-swap-workingset-make-anon-shadow-nodes-memcg-aware.patch > > This patch will shortly appear at > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-swap-workingset-make-anon-shadow-nodes-memcg-aware.patch > > This patch will later appear in the mm-unstable branch at > git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > > Before you just go and hit "reply", please: > a) Consider who else should be cc'ed > b) Prefer to cc a suitable mailing list as well > c) Ideally: find the original patch on the mailing list and do a > reply-to-all to that, adding suitable additional cc's > > *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** > > The -mm tree is included into linux-next via the mm-everything > branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > and is updated there every 2-3 working days > > ------------------------------------------------------ > From: Kairui Song <kasong@xxxxxxxxxxx> > Subject: mm/swap, workingset: make anon shadow nodes memcg aware > Date: Tue, 20 Aug 2024 17:23:59 +0800 > > Currently, the workingset (shadow) nodes of the swap cache are not > accounted to their corresponding memory cgroup, instead, they are all > accounted to the root cgroup. This leads to inaccurate accounting and > ineffective reclaiming. One cgroup could swap out a large amount of > memory, take up a large amount of memory with shadow nodes without being > accounted. > > This issue is similar to commit 7b785645e8f1 ("mm: fix page cache > convergence regression"), where page cache shadow nodes were incorrectly > accounted. That was due to the accidental dropping of the accounting flag > during the XArray conversion in commit a28334862993 ("page cache: Finish > XArray conversion"). > > However, this fix has a different cause. Swap cache shadow nodes were > never accounted even before the XArray conversion, since they did not > exist until commit 3852f6768ede ("mm/swapcache: support to handle the > shadow entries"), which was years after the XArray conversion. > > It's worth noting that one anon shadow Xarray node may contain different > entries from different cgroup, and it gets accounted at reclaim time, so > it's arguable which cgroup it should be accounted to (as Shakeal Butt > pointed out [1]). File pages may suffer similar issue but less common. > Things like proactive memory reclaim could make thing more complex. > > So this commit still can't provide a 100% accurate accounting of anon > shadows, but it covers the cases when one memory cgroup uses significant > amount of swap, and in most cases memory pressure in one cgroup only > suppose to reclaim this cgroup and children. Besides, this fix is clean > and easy enough. > > Link: https://lore.kernel.org/all/7gzevefivueqtebzvikzbucnrnpurmh3scmfuiuo2tnrs37xso@haj7gzepjur2/ [1] > Link: https://lkml.kernel.org/r/20240820092359.97782-1-ryncsn@xxxxxxxxx > Signed-off-by: Kairui Song <kasong@xxxxxxxxxxx> > Cc: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> > Cc: Chris Li <chrisl@xxxxxxxxxx> > Cc: "Huang, Ying" <ying.huang@xxxxxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxx> > Cc: Muchun Song <muchun.song@xxxxxxxxx> > Cc: Nhat Pham <nphamcs@xxxxxxxxx> > Cc: Roman Gushchin <roman.gushchin@xxxxxxxxx> > Cc: Shakeel Butt <shakeel.butt@xxxxxxxxx> > Cc: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > --- > > mm/swap_state.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > --- a/mm/swap_state.c~mm-swap-workingset-make-anon-shadow-nodes-memcg-aware > +++ a/mm/swap_state.c > @@ -97,6 +97,7 @@ int add_to_swap_cache(struct folio *foli > void *old; > > xas_set_update(&xas, workingset_update_node); > + xas_set_lru(&xas, &shadow_nodes); > > VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); > VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio); > @@ -718,7 +719,7 @@ int init_swap_address_space(unsigned int > return -ENOMEM; > for (i = 0; i < nr; i++) { > space = spaces + i; > - xa_init_flags(&space->i_pages, XA_FLAGS_LOCK_IRQ); > + xa_init_flags(&space->i_pages, XA_FLAGS_LOCK_IRQ | XA_FLAGS_ACCOUNT); > atomic_set(&space->i_mmap_writable, 0); > space->a_ops = &swap_aops; > /* swap cache doesn't use writeback related tags */ > _ > > Patches currently in -mm which might be from kasong@xxxxxxxxxxx are > > mm-swap-clean-up-initialization-helper.patch > mm-swap-skip-slot-cache-on-freeing-for-mthp.patch > mm-swap-allow-cache-reclaim-to-skip-slot-cache.patch > mm-swap-add-a-fragment-cluster-list.patch > mm-swap-relaim-the-cached-parts-that-got-scanned.patch > mm-swap-add-a-adaptive-full-cluster-cache-reclaim.patch > mm-swap-workingset-make-anon-shadow-nodes-memcg-aware.patch >