On Fri, Jul 5, 2024 at 7:25 PM Takero Funaki <flintglass@xxxxxxxxx> wrote: > > This patch fixes zswap global shrinker that did not shrink zpool as > expected. > > The issue it addresses is that `shrink_worker()` did not distinguish > between unexpected errors and expected error codes that should be > skipped, such as when there is no stored page in a memcg. This led to > the shrinking process being aborted on the expected error codes. > > The shrinker should ignore these cases and skip to the next memcg. > However, skipping all memcgs presents another problem. To address this, > this patch tracks progress while walking the memcg tree and checks for > progress once the tree walk is completed. > > To handle the empty memcg case, the helper function `shrink_memcg()` is > modified to check if the memcg is empty and then return -ENOENT. > > Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") > Signed-off-by: Takero Funaki <flintglass@xxxxxxxxx> > --- > mm/zswap.c | 31 +++++++++++++++++++++++++------ > 1 file changed, 25 insertions(+), 6 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index 29944d8145af..f092932e652b 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1317,10 +1317,10 @@ static struct shrinker *zswap_alloc_shrinker(void) > > static int shrink_memcg(struct mem_cgroup *memcg) > { > - int nid, shrunk = 0; > + int nid, shrunk = 0, scanned = 0; > > if (!mem_cgroup_zswap_writeback_enabled(memcg)) > - return -EINVAL; > + return -ENOENT; > > /* > * Skip zombies because their LRUs are reparented and we would be > @@ -1334,19 +1334,30 @@ static int shrink_memcg(struct mem_cgroup *memcg) > > shrunk += list_lru_walk_one(&zswap_list_lru, nid, memcg, > &shrink_memcg_cb, NULL, &nr_to_walk); > + scanned += 1 - nr_to_walk; > } > + > + if (!scanned) > + return -ENOENT; > + > return shrunk ? 0 : -EAGAIN; > } > > static void shrink_worker(struct work_struct *w) > { > struct mem_cgroup *memcg; > - int ret, failures = 0; > + int ret, failures = 0, progress; > unsigned long thr; > > /* Reclaim down to the accept threshold */ > thr = zswap_accept_thr_pages(); > > + /* > + * We might start from the last memcg. > + * That is not a failure. > + */ > + progress = 1; > + I think this is unneeded complexity imo. Doing one less retry if we happen to start at the last memcg should be fine. > /* global reclaim will select cgroup in a round-robin fashion. > * > * We save iteration cursor memcg into zswap_next_shrink, > @@ -1390,9 +1401,12 @@ static void shrink_worker(struct work_struct *w) > */ > if (!memcg) { > spin_unlock(&zswap_shrink_lock); > - if (++failures == MAX_RECLAIM_RETRIES) > + > + /* tree walk completed but no progress */ > + if (!progress && ++failures == MAX_RECLAIM_RETRIES) > break; > > + progress = 0; > goto resched; > } > > @@ -1407,10 +1421,15 @@ static void shrink_worker(struct work_struct *w) > /* drop the extra reference */ > mem_cgroup_put(memcg); > > - if (ret == -EINVAL) > - break; > + /* not a writeback candidate memcg */ Unneeded comment. > + if (ret == -ENOENT) > + continue; > + > if (ret && ++failures == MAX_RECLAIM_RETRIES) > break; > + > + ++progress; > + /* reschedule as we performed some IO */ Unneeded comment. > resched: > cond_resched(); > } while (zswap_total_pages() > thr); > -- > 2.43.0 >