On Mon, Jun 10, 2024 at 1:28 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > On Sat, Jun 8, 2024 at 8:53 AM Takero Funaki <flintglass@xxxxxxxxx> wrote: > > > > This patch fixes zswap global shrinker that did not shrink zpool as > > expected. > > > > The issue it addresses is that `shrink_worker()` did not distinguish > > between unexpected errors and expected error codes that should be > > skipped, such as when there is no stored page in a memcg. This led to > > the shrinking process being aborted on the expected error codes. > > > > The shrinker should ignore these cases and skip to the next memcg. > > However, skipping all memcgs presents another problem. To address this, > > this patch tracks progress while walking the memcg tree and checks for > > progress once the tree walk is completed. > > > > To handle the empty memcg case, the helper function `shrink_memcg()` is > > modified to check if the memcg is empty and then return -ENOENT. > > > > Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") > > Signed-off-by: Takero Funaki <flintglass@xxxxxxxxx> > > --- > > mm/zswap.c | 31 ++++++++++++++++++++++++++----- > > 1 file changed, 26 insertions(+), 5 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index d720a42069b6..1a90f434f247 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -1393,7 +1393,7 @@ static struct shrinker *zswap_alloc_shrinker(void) > > > > static int shrink_memcg(struct mem_cgroup *memcg) > > { > > - int nid, shrunk = 0; > > + int nid, shrunk = 0, stored = 0; > > > > if (!mem_cgroup_zswap_writeback_enabled(memcg)) > > return -EINVAL; > > @@ -1408,9 +1408,16 @@ static int shrink_memcg(struct mem_cgroup *memcg) > > for_each_node_state(nid, N_NORMAL_MEMORY) { > > unsigned long nr_to_walk = 1; > > > > + if (!list_lru_count_one(&zswap_list_lru, nid, memcg)) > > + continue; > > + ++stored; > > shrunk += list_lru_walk_one(&zswap_list_lru, nid, memcg, > > &shrink_memcg_cb, NULL, &nr_to_walk); > > } > > + > > + if (!stored) > > + return -ENOENT; > > + > > Can't we just check nr_to_walk here and return -ENOENT if it remains as 1? > > Something like: > > if (nr_to_walk) > return -ENOENT; > if (!shrunk) > return -EAGAIN; > return 0; > > > return shrunk ? 0 : -EAGAIN; > > } > > > > @@ -1418,12 +1425,18 @@ static void shrink_worker(struct work_struct *w) > > { > > struct mem_cgroup *memcg = NULL; > > struct mem_cgroup *next_memcg; > > - int ret, failures = 0; > > + int ret, failures = 0, progress; > > unsigned long thr; > > > > /* Reclaim down to the accept threshold */ > > thr = zswap_accept_thr_pages(); > > > > + /* > > + * We might start from the last memcg. > > + * That is not a failure. > > + */ > > + progress = 1; > > + > > /* global reclaim will select cgroup in a round-robin fashion. > > * > > * We save iteration cursor memcg into zswap_next_shrink, > > @@ -1461,9 +1474,12 @@ static void shrink_worker(struct work_struct *w) > > */ > > if (!memcg) { > > spin_unlock(&zswap_shrink_lock); > > - if (++failures == MAX_RECLAIM_RETRIES) > > + > > + /* tree walk completed but no progress */ > > + if (!progress && ++failures == MAX_RECLAIM_RETRIES) > > break; > > It seems like we may keep iterating the entire hierarchy a lot of > times as long as we are making any type of progress. This doesn't seem > right. > Ah actually, reading your comment closer then yeah the shrink-until-accept is the intended-ish behavior (from Domenico's older patches to re-work zswap reclaim). The one-page-per-attempt is also "intended" behavior, but not any of our's intentions - just something that remains from the past implementation as we rework this codebase one change at a time :)