[PATCH] mm/swap, workingset: make anon shadow nodes memcg aware

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Kairui Song <kasong@xxxxxxxxxxx>

Currently, the workingset (shadow) nodes of the swap cache are not
accounted to their corresponding memory cgroup, instead, they are
all accounted to the root cgroup. This leads to inaccurate accounting
and ineffective reclaiming. One cgroup could swap out a large amount
of memory, take up a large amount of memory with shadow nodes without
being accounted.

This issue is similar to commit 7b785645e8f1 ("mm: fix page cache
convergence regression"), where page cache shadow nodes were incorrectly
accounted. That was due to the accidental dropping of the accounting
flag during the XArray conversion in commit a28334862993
("page cache: Finish XArray conversion").

However, this fix has a different cause. Swap cache shadow nodes were
never accounted even before the XArray conversion, since they did not
exist until commit 3852f6768ede ("mm/swapcache: support to handle the
shadow entries"), which was years after the XArray conversion.

It's worth noting that one anon shadow Xarray node may contain
different entries from different cgroup, and it gets accounted at reclaim
time, so it's arguable which cgroup it should be accounted to (as
Shakeal Butt pointed out [1]). File pages may suffer similar issue
but less common. Things like proactive memory reclaim could make thing
more complex.

So this commit still can't provide a 100% accurate accounting of anon
shadows, but it covers the cases when one memory cgroup uses significant
amount of swap, and in most cases memory pressure in one cgroup only
suppose to reclaim this cgroup and children. Besides, this fix is clean and
easy enough.

Link: https://lore.kernel.org/all/7gzevefivueqtebzvikzbucnrnpurmh3scmfuiuo2tnrs37xso@haj7gzepjur2/ [1]
Signed-off-by: Kairui Song <kasong@xxxxxxxxxxx>

---

This patch was part of previous series:
https://lore.kernel.org/all/20240624175313.47329-1-ryncsn@xxxxxxxxx/

Split out as a fix as suggested by Muchun and Shakeal.

 mm/swap_state.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 4669f29cf555..b4ed2c664c67 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -97,6 +97,7 @@ int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
 	void *old;
 
 	xas_set_update(&xas, workingset_update_node);
+	xas_set_lru(&xas, &shadow_nodes);
 
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
 	VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
@@ -718,7 +719,7 @@ int init_swap_address_space(unsigned int type, unsigned long nr_pages)
 		return -ENOMEM;
 	for (i = 0; i < nr; i++) {
 		space = spaces + i;
-		xa_init_flags(&space->i_pages, XA_FLAGS_LOCK_IRQ);
+		xa_init_flags(&space->i_pages, XA_FLAGS_LOCK_IRQ | XA_FLAGS_ACCOUNT);
 		atomic_set(&space->i_mmap_writable, 0);
 		space->a_ops = &swap_aops;
 		/* swap cache doesn't use writeback related tags */
-- 
2.45.2





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux