Re: [PATCH v8 3/6] zswap: make shrinking memcg-aware

Nhat Pham <nphamcs@xxxxxxxxx> · Tue, 5 Dec 2023 11:09:30 -0800

On Tue, Dec 5, 2023 at 11:00 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>
> [..]
> > > >  static void shrink_worker(struct work_struct *w)
> > > >  {
> > > >         struct zswap_pool *pool = container_of(w, typeof(*pool),
> > > >                                                 shrink_work);
> > > > +       struct mem_cgroup *memcg;
> > > >         int ret, failures = 0;
> > > >
> > > > +       /* global reclaim will select cgroup in a round-robin fashion. */
> > > >         do {
> > > > -               ret = zswap_reclaim_entry(pool);
> > > > -               if (ret) {
> > > > -                       zswap_reject_reclaim_fail++;
> > > > -                       if (ret != -EAGAIN)
> > > > +               spin_lock(&zswap_pools_lock);
> > > > +               pool->next_shrink = mem_cgroup_iter(NULL, pool->next_shrink, NULL);
> > > > +               memcg = pool->next_shrink;
> > > > +
> > > > +               /*
> > > > +                * We need to retry if we have gone through a full round trip, or if we
> > > > +                * got an offline memcg (or else we risk undoing the effect of the
> > > > +                * zswap memcg offlining cleanup callback). This is not catastrophic
> > > > +                * per se, but it will keep the now offlined memcg hostage for a while.
> > > > +                *
> > > > +                * Note that if we got an online memcg, we will keep the extra
> > > > +                * reference in case the original reference obtained by mem_cgroup_iter
> > > > +                * is dropped by the zswap memcg offlining callback, ensuring that the
> > > > +                * memcg is not killed when we are reclaiming.
> > > > +                */
> > > > +               if (!memcg) {
> > > > +                       spin_unlock(&zswap_pools_lock);
> > > > +                       if (++failures == MAX_RECLAIM_RETRIES)
> > > >                                 break;
> > > > +
> > > > +                       goto resched;
> > > > +               }
> > > > +
> > > > +               if (!mem_cgroup_online(memcg)) {
> > > > +                       /* drop the reference from mem_cgroup_iter() */
> > > > +                       mem_cgroup_put(memcg);
> > >
> > > Probably better to use mem_cgroup_iter_break() here?
> >
> > mem_cgroup_iter_break(NULL, memcg) seems to perform the same thing, right?
>
> Yes, but it's better to break the iteration with the documented API
> (e.g. if mem_cgroup_iter_break() changes to do extra work).

Hmm, a mostly aesthetic fix to me, but I don't have a strong opinion otherwise.

>
> >
> > >
> > > Also, I don't see mem_cgroup_tryget_online() being used here (where I
> > > expected it to be used), did I miss it?
> >
> > Oh shoot yeah that was a typo - it should be
> > mem_cgroup_tryget_online(). Let me send a fix to that.
> >
> > >
> > > > +                       pool->next_shrink = NULL;
> > > > +                       spin_unlock(&zswap_pools_lock);
> > > > +
> > > >                         if (++failures == MAX_RECLAIM_RETRIES)
> > > >                                 break;
> > > > +
> > > > +                       goto resched;
> > > >                 }
> > > > +               spin_unlock(&zswap_pools_lock);
> > > > +
> > > > +               ret = shrink_memcg(memcg);
> > >
> > > We just checked for online-ness above, and then shrink_memcg() checks
> > > it again. Is this intentional?
> >
> > Hmm these two checks are for two different purposes. The check above
> > is mainly to prevent accidentally undoing the offline cleanup callback
> > during memcg selection step. Inside shrink_memcg(), we check
> > onlineness again to prevent reclaiming from offlined memcgs - which in
> > effect will trigger the reclaim of the parent's memcg.
>
> Right, but two checks in close proximity are not doing a lot.
> Especially that the memcg online-ness can change right after the check
> inside shrink_memcg() anyway, so it's a best effort thing.
>
> Anyway, it shouldn't matter much. We can leave it.
>
> >
> > >
> > > > +               /* drop the extra reference */
> > >
> > > Where does the extra reference come from?
> >
> > The extra reference is from mem_cgroup_tryget_online(). We get two
> > references in the dance above - one from mem_cgroup_iter() (which can
> > be dropped) and one extra from mem_cgroup_tryget_online(). I kept the
> > second one in case the first one was dropped by the zswap memcg
> > offlining callback, but after reclaiming it is safe to just drop it.
>
> Right. I was confused by the missing mem_cgroup_tryget_online().
>
> >
> > >
> > > > +               mem_cgroup_put(memcg);
> > > > +
> > > > +               if (ret == -EINVAL)
> > > > +                       break;
> > > > +               if (ret && ++failures == MAX_RECLAIM_RETRIES)
> > > > +                       break;
> > > > +
> > > > +resched:
> > > >                 cond_resched();
> > > >         } while (!zswap_can_accept());
> > > > -       zswap_pool_put(pool);
> > > >  }
> > > >
> > > >  static struct zswap_pool *zswap_pool_create(char *type, char *compressor)
> [..]
> > > > @@ -1240,15 +1395,15 @@ bool zswap_store(struct folio *folio)
> > > >                 zswap_invalidate_entry(tree, dupentry);
> > > >         }
> > > >         spin_unlock(&tree->lock);
> > > > -
> > > > -       /*
> > > > -        * XXX: zswap reclaim does not work with cgroups yet. Without a
> > > > -        * cgroup-aware entry LRU, we will push out entries system-wide based on
> > > > -        * local cgroup limits.
> > > > -        */
> > > >         objcg = get_obj_cgroup_from_folio(folio);
> > > > -       if (objcg && !obj_cgroup_may_zswap(objcg))
> > > > -               goto reject;
> > > > +       if (objcg && !obj_cgroup_may_zswap(objcg)) {
> > > > +               memcg = get_mem_cgroup_from_objcg(objcg);
> > >
> > > Do we need a reference here? IIUC, this is folio_memcg() and the folio
> > > is locked, so folio_memcg() should remain stable, no?
> >
> > Hmmm obj_cgroup_may_zswap() also holds a reference to the objcg's
> > memcg, so I just followed the patterns to be safe.
>
> Perhaps it's less clear inside obj_cgroup_may_zswap(). We can actually
> pass the folio to obj_cgroup_may_zswap(), add a debug check that the
> folio is locked, and avoid getting the ref there as well. That can be
> done separately. Perhaps Johannes can shed some light on this, if
> there's a different reason why getting a ref there is needed.
>
> For this change, I think the refcount manipulation is unnecessary.

Hmm true. I'm leaning towards playing it safe - worst case scenario,
we can send a follow up patch to optimize this (perhaps for both
places, if neither place requires pinning the memcg). But I'll wait
for Johannes to chime in with his opinions on the matter.

>
> >
> >
> > >
> > > Same for the call below.
> > >
> > > > +               if (shrink_memcg(memcg)) {
> > > > +                       mem_cgroup_put(memcg);
> > > > +                       goto reject;
> > > > +               }
> > > > +               mem_cgroup_put(memcg);
> > > > +       }
> > > >
> > > >         /* reclaim space if needed */
> > > >         if (zswap_is_full()) {
> [..]