Re: [PATCH] zswap: do not shrink when memory.zswap.max is 0

Nhat Pham <nphamcs@xxxxxxxxx> · Tue, 30 May 2023 11:27:10 -0700

On Tue, May 30, 2023 at 9:53 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>
> On Tue, May 30, 2023 at 9:22 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
> >
> > Before storing a page, zswap first checks if the number of stored pages
> > exceeds the limit specified by memory.zswap.max, for each cgroup in the
> > hierarchy. If this limit is reached or exceeded, then zswap shrinking is
> > triggered and short-circuits the store attempt.
> >
> > However, if memory.zswap.max = 0 for a cgroup, no amount of writeback
> > will allow future store attempts from processes in this cgroup to
> > succeed. Furthermore, this create a pathological behavior in a system
> > where some cgroups have memory.zswap.max = 0 and some do not: the
> > processes in the former cgroups, under memory pressure, will evict pages
> > stored by the latter continually, until the need for swap ceases or the
> > pool becomes empty.
> >
> > As a result of this, we observe a disproportionate amount of zswap
> > writeback and a perpetually small zswap pool in our experiments, even
> > though the pool limit is never hit.
> >
> > This patch fixes the issue by rejecting zswap store attempt without
> > shrinking the pool when memory.zswap.max is 0.
> >
> > Fixes: f4840ccfca25 ("zswap: memcg accounting")
> > Signed-off-by: Nhat Pham <nphamcs@xxxxxxxxx>
> > ---
> >  include/linux/memcontrol.h | 6 +++---
> >  mm/memcontrol.c            | 8 ++++----
> >  mm/zswap.c                 | 9 +++++++--
> >  3 files changed, 14 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 222d7370134c..507bed3a28b0 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -1899,13 +1899,13 @@ static inline void count_objcg_event(struct obj_cgroup *objcg,
> >  #endif /* CONFIG_MEMCG_KMEM */
> >
> >  #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP)
> > -bool obj_cgroup_may_zswap(struct obj_cgroup *objcg);
> > +int obj_cgroup_may_zswap(struct obj_cgroup *objcg);
> >  void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size);
> >  void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size);
> >  #else
> > -static inline bool obj_cgroup_may_zswap(struct obj_cgroup *objcg)
> > +static inline int obj_cgroup_may_zswap(struct obj_cgroup *objcg)
> >  {
> > -       return true;
> > +       return 0;
> >  }
> >  static inline void obj_cgroup_charge_zswap(struct obj_cgroup *objcg,
> >                                            size_t size)
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 4b27e245a055..09aad0e6f2ea 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -7783,10 +7783,10 @@ static struct cftype memsw_files[] = {
> >   * spending cycles on compression when there is already no room left
> >   * or zswap is disabled altogether somewhere in the hierarchy.
> >   */
> > -bool obj_cgroup_may_zswap(struct obj_cgroup *objcg)
> > +int obj_cgroup_may_zswap(struct obj_cgroup *objcg)
> >  {
> >         struct mem_cgroup *memcg, *original_memcg;
> > -       bool ret = true;
> > +       int ret = 0;
> >
> >         if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> >                 return true;
> > @@ -7800,7 +7800,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *objcg)
> >                 if (max == PAGE_COUNTER_MAX)
> >                         continue;
> >                 if (max == 0) {
> > -                       ret = false;
> > +                       ret = -ENODEV;
> >                         break;
> >                 }
> >
> > @@ -7808,7 +7808,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *objcg)
> >                 pages = memcg_page_state(memcg, MEMCG_ZSWAP_B) / PAGE_SIZE;
> >                 if (pages < max)
> >                         continue;
> > -               ret = false;
> > +               ret = -ENOMEM;
> >                 break;
> >         }
> >         mem_cgroup_put(original_memcg);
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 59da2a415fbb..7b13dc865438 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -1175,8 +1175,13 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
> >         }
> >
> >         objcg = get_obj_cgroup_from_page(page);
> > -       if (objcg && !obj_cgroup_may_zswap(objcg))
> > -               goto shrink;
> > +       if (objcg) {
> > +               ret = obj_cgroup_may_zswap(objcg);
> > +               if (ret == -ENODEV)
> > +                       goto reject;
> > +               if (ret == -ENOMEM)
> > +                       goto shrink;
> > +       }
>
> I wonder if we should just make this:
>
> if (objcg && !obj_cgroup_may_zswap(objcg))
>         goto reject;
>
> Even if memory.zswap.max is > 0, if the limit is hit, shrinking the
> zswap pool will only help if we happen to writeback a page from the
> same memcg that hit its limit. Keep in mind that we will only
> writeback one page every time we observe that the limit is hit (even
> with Domenico's patch, because zswap_can_accept() should be true).
>
> On a system with a handful of memcgs,
> it seems likely that we wrongfully writeback pages from other memcgs
> because of this. Achieving nothing for this memcg, while hurting
> others. OTOH, without invoking writeback when the limit is hit, the
> memcg will just not be able to use zswap until some pages are
> faulted back in or invalidated.
>
> I am not sure which is better, just thinking out loud.
>
> Seems like this can be solved by having per-memcg LRUs, or at least
> providing an argument to the shrinker of which memcg to reclaim from.
> This would only be possible when the LRU is moved to zswap.

I totally agree! This seems like the logical next step in zswap's evolution.
I actually proposed this fix with this future development in mind - with
a per-memcg LRU, we can trigger memcg-specific shrinking in
place of this indiscriminate writeback. It seems less drastic a change
(compared to removing shrinking here now, then reintroducing it later).

Thanks for the feedback, Yosry!

>
>
> >
> >         /* reclaim space if needed */
> >         if (zswap_is_full()) {
> > --
> > 2.34.1
> >
> >