Re: [PATCH v1 1/3] mm: zswap: fix global shrinker memcg iteration

Nhat Pham <nphamcs@xxxxxxxxx> · Thu, 13 Jun 2024 08:04:39 -0700

On Wed, Jun 12, 2024 at 7:58 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>
> On Wed, Jun 12, 2024 at 7:36 PM Takero Funaki <flintglass@xxxxxxxxx> wrote:
> >
> > 2024年6月13日(木) 11:18 Yosry Ahmed <yosryahmed@xxxxxxxxxx>:
> >
> > > > The corrected version of the cleaner should be:
> > > > ```c
> > > > void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg)
> > > > {
> > > >         /* lock out zswap shrinker walking memcg tree */
> > > >         spin_lock(&zswap_shrink_lock);
> > > >         if (zswap_next_shrink == memcg) {
> > > >                 do {
> > > >                         zswap_next_shrink = mem_cgroup_iter(NULL,
> > > >                                         zswap_next_shrink, NULL);
> > > >                         spin_unlock(&zswap_shrink_lock);
> > > >                         spin_lock(&zswap_shrink_lock);
> > > >                         if (!zswap_next_shrink)
> > > >                                 break;
> > > >                 } while (!mem_cgroup_online(zswap_next_shrink));
> > > >         }
> > > >         spin_unlock(&zswap_shrink_lock);
> > > > }
> > > > ```
> > >
> > > Is the idea here to avoid moving the iterator to another offline memcg
> > > that zswap_memcg_offline_cleanup() was already called for, to avoid
> > > holding a ref on that memcg until the next run of zswap shrinking?
> > >
> > > If yes, I think it's probably worth doing. But why do we need to
> > > release and reacquire the lock in the loop above?
> >
> > Yes, the existing cleaner might leave the offline, already-cleaned memcg.
> >
> > The reacquiring lock is to not loop inside the critical section.
> > In shrink_worker of v0 patch, the loop was restarted on offline memcg
> > without releasing the lock. Nhat pointed out that we should drop the
> > lock after every mem_cgroup_iter() call. v1 was changed to reacquire
> > once per iteration like the cleaner code above.
>
> I am not sure how often we'll run into a situation where we'll be
> holding the lock for too long tbh. It should be unlikely to keep
> encountering offline memcgs for a long time.
>
> Nhat, do you think this could cause a problem in practice?

I don't remember prescribing anything to be honest :) I think I was
just asking why can't we just drop the lock, then "continue;". This is
mostly for simplicity's sake.

https://lore.kernel.org/linux-mm/CAKEwX=MwrRc43iM2050v5u-TPUK4Yn+a4G7+h6ieKhpQ7WtQ=A@xxxxxxxxxxxxxx/

But I think as Takero pointed out, it would still skip over the memcg
that was (concurrently) updated to zswap_next_shrink by the memcg
offline callback.