RE: [PATCH v2 mm-hotfixes] mm/zswap: fix inconsistent charging when zswap_store_page() fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
> Sent: Tuesday, January 28, 2025 10:40 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>; Yosry Ahmed
> <yosryahmed@xxxxxxxxxx>; Nhat Pham <nphamcs@xxxxxxxxx>; Chengming
> Zhou <chengming.zhou@xxxxxxxxx>; Andrew Morton <akpm@linux-
> foundation.org>; linux-mm@xxxxxxxxx; stable@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v2 mm-hotfixes] mm/zswap: fix inconsistent charging
> when zswap_store_page() fails
> 
> On Wed, Jan 29, 2025 at 4:09 AM Sridhar, Kanchana P
> <kanchana.p.sridhar@xxxxxxxxx> wrote:
> >
> > Hi Hyeonggon,
> >
> > > -----Original Message-----
> > > From: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
> > > Sent: Tuesday, January 28, 2025 10:55 AM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>; Johannes
> Weiner
> > > <hannes@xxxxxxxxxxx>; Yosry Ahmed <yosryahmed@xxxxxxxxxx>; Nhat
> > > Pham <nphamcs@xxxxxxxxx>; Chengming Zhou
> > > <chengming.zhou@xxxxxxxxx>; Andrew Morton <akpm@linux-
> > > foundation.org>
> > > Cc: linux-mm@xxxxxxxxx; Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>;
> > > stable@xxxxxxxxxxxxxxx
> > > Subject: [PATCH v2 mm-hotfixes] mm/zswap: fix inconsistent charging
> when
> > > zswap_store_page() fails
> > >
> > > Commit b7c0ccdfbafd ("mm: zswap: support large folios in
> zswap_store()")
> > > skips charging any zswapped base pages when it failed to zswap the
> entire
> > > folio.
> > >
> > > However, when some base pages are zswapped but it failed to zswap
> > > the entire folio, the zswap operation is rolled back.
> > > When freeing zswap entries for those pages, zswap_entry_free()
> uncharges
> > > the pages that were not previously charged, causing zswap charging to
> > > become inconsistent.
> > >
> > > This inconsistency triggers two warnings with following steps:
> > >   # On a machine with 64GiB of RAM and 36GiB of zswap
> > >   $ stress-ng --bigheap 2 # wait until the OOM-killer kills stress-ng
> > >   $ sudo reboot
> > >
> > >   Two warnings are:
> > >     in mm/memcontrol.c:163, function obj_cgroup_release():
> > >       WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
> > >
> > >     in mm/page_counter.c:60, function page_counter_cancel():
> > >       if (WARN_ONCE(new < 0, "page_counter underflow: %ld
> > > nr_pages=%lu\n",
> > >         new, nr_pages))
> > >
> > > While objcg events should only be accounted for when the entire folio is
> > > zswapped, objcg charging should be performed regardlessly.
> > > Fix accordingly.
> > >
> > > After resolving the inconsistency, these warnings disappear.
> > >
> > > Fixes: b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
> > > Cc: stable@xxxxxxxxxxxxxxx
> > > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
> > > ---
> > >
> > > v1->v2:
> > >
> > >  Fixed objcg events being accounted for on zswap failure.
> > >
> > >  Fixed the incorrect description. I misunderstood that the base pages are
> > >  going to be stored in zswap, but their zswap entries are freed
> immediately.
> > >
> > >  Added a comment on why it charges pages that are going to be removed
> > >  from zswap.
> > >
> > >  mm/zswap.c | 14 ++++++++++----
> > >  1 file changed, 10 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > index 6504174fbc6a..10b30ac46deb 100644
> > > --- a/mm/zswap.c
> > > +++ b/mm/zswap.c
> > > @@ -1568,20 +1568,26 @@ bool zswap_store(struct folio *folio)
> > >
> > >               bytes = zswap_store_page(page, objcg, pool);
> > >               if (bytes < 0)
> > > -                     goto put_pool;
> > > +                     goto charge_zswap;
> > >               compressed_bytes += bytes;
> > >       }
> > >
> > > -     if (objcg) {
> > > -             obj_cgroup_charge_zswap(objcg, compressed_bytes);
> > > +     if (objcg)
> > >               count_objcg_events(objcg, ZSWPOUT, nr_pages);
> > > -     }
> > >
> > >       atomic_long_add(nr_pages, &zswap_stored_pages);
> > >       count_vm_events(ZSWPOUT, nr_pages);
> > >
> > >       ret = true;
> > >
> > > +charge_zswap:
> > > +     /*
> > > +      * Charge zswapped pages even when it failed to zswap the entire
> > > folio,
> > > +      * because zswap_entry_free() will uncharge them anyway.
> > > +      * Otherwise zswap charging will become inconsistent.
> > > +      */
> > > +     if (objcg)
> > > +             obj_cgroup_charge_zswap(objcg, compressed_bytes);
> >
> > Thanks for finding this bug! I am thinking it might make sense to charge
> > and increment the zswap_stored_pages counter in zswap_store_page().
> > Something like:
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index b84c20d889b1..fd2a72598a8a 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -1504,11 +1504,14 @@ static ssize_t zswap_store_page(struct page
> *page,
> >         entry->pool = pool;
> >         entry->swpentry = page_swpentry;
> >         entry->objcg = objcg;
> > +       if (objcg)
> > +               obj_cgroup_charge_zswap(objcg, entry->length);
> >         entry->referenced = true;
> >         if (entry->length) {
> >                 INIT_LIST_HEAD(&entry->lru);
> >                 zswap_lru_add(&zswap_list_lru, entry);
> >         }
> > +       atomic_long_inc(&zswap_stored_pages);
> >
> >         return entry->length;
> >
> > @@ -1526,7 +1529,6 @@ bool zswap_store(struct folio *folio)
> >         struct obj_cgroup *objcg = NULL;
> >         struct mem_cgroup *memcg = NULL;
> >         struct zswap_pool *pool;
> > -       size_t compressed_bytes = 0;
> >         bool ret = false;
> >         long index;
> >
> > @@ -1569,15 +1571,11 @@ bool zswap_store(struct folio *folio)
> >                 bytes = zswap_store_page(page, objcg, pool);
> >                 if (bytes < 0)
> >                         goto put_pool;
> > -               compressed_bytes += bytes;
> >         }
> >
> > -       if (objcg) {
> > -               obj_cgroup_charge_zswap(objcg, compressed_bytes);
> > +       if (objcg)
> >                 count_objcg_events(objcg, ZSWPOUT, nr_pages);
> > -       }
> >
> > -       atomic_long_add(nr_pages, &zswap_stored_pages);
> >         count_vm_events(ZSWPOUT, nr_pages);
> >
> >         ret = true;
> 
> Hi Sridhar, It looks much clearer!
> And we can optimize if it turns out to be worth the complexity.
> 
> May I ask your permission to add your Signed-off-by: and Co-developed-by: ?

Yes, please go ahead Hyeonggon.

Thanks!
Kanchana

> I'm afraid to use this without your confirmation due to the
> Developer's Certificate of Origin.
> 
> Best,
> Hyeonggon
> 
> > >  put_pool:
> > >       zswap_pool_put(pool);
> > >  put_objcg:
> > > --
> > > 2.47.1
> >




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux