The patch titled Subject: zswap: do not shrink if cgroup may not zswap has been added to the -mm mm-unstable branch. Its filename is zswap-do-not-shrink-if-cgroup-may-not-zswap.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/zswap-do-not-shrink-if-cgroup-may-not-zswap.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Nhat Pham <nphamcs@xxxxxxxxx> Subject: zswap: do not shrink if cgroup may not zswap Date: Tue, 30 May 2023 15:24:40 -0700 Before storing a page, zswap first checks if the number of stored pages exceeds the limit specified by memory.zswap.max, for each cgroup in the hierarchy. If this limit is reached or exceeded, then zswap shrinking is triggered and short-circuits the store attempt. However, since the zswap's LRU is not memcg-aware, this can create the following pathological behavior: the cgroup whose zswap limit is reached will evict pages from other cgroups continually, without lowering its own zswap usage. This means the shrinking will continue until the need for swap ceases or the pool becomes empty. As a result of this, we observe a disproportionate amount of zswap writeback and a perpetually small zswap pool in our experiments, even though the pool limit is never hit. This patch fixes the issue by rejecting zswap store attempt without shrinking the pool when obj_cgroup_may_zswap() returns false. Link: https://lkml.kernel.org/r/20230530222440.2777700-1-nphamcs@xxxxxxxxx Fixes: f4840ccfca25 ("zswap: memcg accounting") Signed-off-by: Nhat Pham <nphamcs@xxxxxxxxx> Cc: Dan Streetman <ddstreet@xxxxxxxx> Cc: Domenico Cerasuolo <cerasuolodomenico@xxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Seth Jennings <sjenning@xxxxxxxxxx> Cc: Vitaly Wool <vitaly.wool@xxxxxxxxxxxx> Cc: Yosry Ahmed <yosryahmed@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/zswap.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/mm/zswap.c~zswap-do-not-shrink-if-cgroup-may-not-zswap +++ a/mm/zswap.c @@ -1174,9 +1174,14 @@ static int zswap_frontswap_store(unsigne goto reject; } + /* + * XXX: zswap reclaim does not work with cgroups yet. Without a + * cgroup-aware entry LRU, we will push out entries system-wide based on + * local cgroup limits. + */ objcg = get_obj_cgroup_from_page(page); if (objcg && !obj_cgroup_may_zswap(objcg)) - goto shrink; + goto reject; /* reclaim space if needed */ if (zswap_is_full()) { _ Patches currently in -mm which might be from nphamcs@xxxxxxxxx are workingset-refactor-lru-refault-to-expose-refault-recency-check.patch cachestat-implement-cachestat-syscall.patch cachestat-implement-cachestat-syscall-fix.patch cachestat-wire-up-cachestat-for-other-architectures.patch cachestat-wire-up-cachestat-for-other-architectures-fix.patch selftests-add-selftests-for-cachestat.patch zswap-do-not-shrink-if-cgroup-may-not-zswap.patch