Patch "zswap: do not shrink if cgroup may not zswap" has been added to the 6.3-stable tree

<gregkh@xxxxxxxxxxxxxxxxxxx> · Sat, 17 Jun 2023 10:33:14 +0200

This is a note to let you know that I've just added the patch titled

    zswap: do not shrink if cgroup may not zswap

to the 6.3-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     zswap-do-not-shrink-if-cgroup-may-not-zswap.patch
and it can be found in the queue-6.3 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.


>From 0bdf0efa180a9cb1361cbded4e2260a49306ac89 Mon Sep 17 00:00:00 2001
From: Nhat Pham <nphamcs@xxxxxxxxx>
Date: Tue, 30 May 2023 15:24:40 -0700
Subject: zswap: do not shrink if cgroup may not zswap

From: Nhat Pham <nphamcs@xxxxxxxxx>

commit 0bdf0efa180a9cb1361cbded4e2260a49306ac89 upstream.

Before storing a page, zswap first checks if the number of stored pages
exceeds the limit specified by memory.zswap.max, for each cgroup in the
hierarchy.  If this limit is reached or exceeded, then zswap shrinking is
triggered and short-circuits the store attempt.

However, since the zswap's LRU is not memcg-aware, this can create the
following pathological behavior: the cgroup whose zswap limit is 0 will
evict pages from other cgroups continually, without lowering its own zswap
usage.  This means the shrinking will continue until the need for swap
ceases or the pool becomes empty.

As a result of this, we observe a disproportionate amount of zswap
writeback and a perpetually small zswap pool in our experiments, even
though the pool limit is never hit.

More generally, a cgroup might unnecessarily evict pages from other
cgroups before we drive the memcg back below its limit.

This patch fixes the issue by rejecting zswap store attempt without
shrinking the pool when obj_cgroup_may_zswap() returns false.

[akpm@xxxxxxxxxxxxxxxxxxxx: fix return of unintialized value]
[akpm@xxxxxxxxxxxxxxxxxxxx: s/ENOSPC/ENOMEM/]
Link: https://lkml.kernel.org/r/20230530222440.2777700-1-nphamcs@xxxxxxxxx
Link: https://lkml.kernel.org/r/20230530232435.3097106-1-nphamcs@xxxxxxxxx
Fixes: f4840ccfca25 ("zswap: memcg accounting")
Signed-off-by: Nhat Pham <nphamcs@xxxxxxxxx>
Cc: Dan Streetman <ddstreet@xxxxxxxx>
Cc: Domenico Cerasuolo <cerasuolodomenico@xxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Seth Jennings <sjenning@xxxxxxxxxx>
Cc: Vitaly Wool <vitaly.wool@xxxxxxxxxxxx>
Cc: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
---
 mm/zswap.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1141,9 +1141,16 @@ static int zswap_frontswap_store(unsigne
 		goto reject;
 	}
 
+	/*
+	 * XXX: zswap reclaim does not work with cgroups yet. Without a
+	 * cgroup-aware entry LRU, we will push out entries system-wide based on
+	 * local cgroup limits.
+	 */
 	objcg = get_obj_cgroup_from_page(page);
-	if (objcg && !obj_cgroup_may_zswap(objcg))
-		goto shrink;
+	if (objcg && !obj_cgroup_may_zswap(objcg)) {
+		ret = -ENOMEM;
+		goto reject;
+	}
 
 	/* reclaim space if needed */
 	if (zswap_is_full()) {


Patches currently in stable-queue which might be from nphamcs@xxxxxxxxx are

queue-6.3/zswap-do-not-shrink-if-cgroup-may-not-zswap.patch