On Sat, Jun 8, 2024 at 8:53 AM Takero Funaki <flintglass@xxxxxxxxx> wrote: > > This patch fixes zswap global shrinker that did not shrink zpool as > expected. > > The issue it addresses is that `shrink_worker()` did not distinguish > between unexpected errors and expected error codes that should be > skipped, such as when there is no stored page in a memcg. This led to > the shrinking process being aborted on the expected error codes. > > The shrinker should ignore these cases and skip to the next memcg. > However, skipping all memcgs presents another problem. To address this, > this patch tracks progress while walking the memcg tree and checks for > progress once the tree walk is completed. > > To handle the empty memcg case, the helper function `shrink_memcg()` is > modified to check if the memcg is empty and then return -ENOENT. > > Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") > Signed-off-by: Takero Funaki <flintglass@xxxxxxxxx> > --- > mm/zswap.c | 31 ++++++++++++++++++++++++++----- > 1 file changed, 26 insertions(+), 5 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index d720a42069b6..1a90f434f247 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1393,7 +1393,7 @@ static struct shrinker *zswap_alloc_shrinker(void) > > static int shrink_memcg(struct mem_cgroup *memcg) > { > - int nid, shrunk = 0; > + int nid, shrunk = 0, stored = 0; > > if (!mem_cgroup_zswap_writeback_enabled(memcg)) > return -EINVAL; > @@ -1408,9 +1408,16 @@ static int shrink_memcg(struct mem_cgroup *memcg) > for_each_node_state(nid, N_NORMAL_MEMORY) { > unsigned long nr_to_walk = 1; > > + if (!list_lru_count_one(&zswap_list_lru, nid, memcg)) > + continue; > + ++stored; > shrunk += list_lru_walk_one(&zswap_list_lru, nid, memcg, > &shrink_memcg_cb, NULL, &nr_to_walk); > } > + > + if (!stored) > + return -ENOENT; > + Can't we just check nr_to_walk here and return -ENOENT if it remains as 1? Something like: if (nr_to_walk) return -ENOENT; if (!shrunk) return -EAGAIN; return 0; > return shrunk ? 0 : -EAGAIN; > } > > @@ -1418,12 +1425,18 @@ static void shrink_worker(struct work_struct *w) > { > struct mem_cgroup *memcg = NULL; > struct mem_cgroup *next_memcg; > - int ret, failures = 0; > + int ret, failures = 0, progress; > unsigned long thr; > > /* Reclaim down to the accept threshold */ > thr = zswap_accept_thr_pages(); > > + /* > + * We might start from the last memcg. > + * That is not a failure. > + */ > + progress = 1; > + > /* global reclaim will select cgroup in a round-robin fashion. > * > * We save iteration cursor memcg into zswap_next_shrink, > @@ -1461,9 +1474,12 @@ static void shrink_worker(struct work_struct *w) > */ > if (!memcg) { > spin_unlock(&zswap_shrink_lock); > - if (++failures == MAX_RECLAIM_RETRIES) > + > + /* tree walk completed but no progress */ > + if (!progress && ++failures == MAX_RECLAIM_RETRIES) > break; It seems like we may keep iterating the entire hierarchy a lot of times as long as we are making any type of progress. This doesn't seem right. > > + progress = 0; > goto resched; > } > > @@ -1493,10 +1509,15 @@ static void shrink_worker(struct work_struct *w) > /* drop the extra reference */ > mem_cgroup_put(memcg); > > - if (ret == -EINVAL) > - break; > + /* not a writeback candidate memcg */ > + if (ret == -EINVAL || ret == -ENOENT) > + continue; > + We should probably return -ENOENT for memcg with writeback disabled as well. > if (ret && ++failures == MAX_RECLAIM_RETRIES) > break; > + > + ++progress; > + /* reschedule as we performed some IO */ > resched: > cond_resched(); > } while (zswap_total_pages() > thr); > -- > 2.43.0 >