RE: [PATCH v2 mm-hotfixes] mm/zswap: fix inconsistent charging when zswap_store_page() fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Yosry Ahmed <yosry.ahmed@xxxxxxxxx>
> Sent: Tuesday, January 28, 2025 11:04 AM
> To: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>; Johannes Weiner
> <hannes@xxxxxxxxxxx>; Nhat Pham <nphamcs@xxxxxxxxx>; Chengming
> Zhou <chengming.zhou@xxxxxxxxx>; Andrew Morton <akpm@linux-
> foundation.org>; linux-mm@xxxxxxxxx; stable@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v2 mm-hotfixes] mm/zswap: fix inconsistent charging
> when zswap_store_page() fails
> 
> On Wed, Jan 29, 2025 at 03:55:07AM +0900, Hyeonggon Yoo wrote:
> > Commit b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
> > skips charging any zswapped base pages when it failed to zswap the entire
> > folio.
> >
> > However, when some base pages are zswapped but it failed to zswap
> > the entire folio, the zswap operation is rolled back.
> > When freeing zswap entries for those pages, zswap_entry_free() uncharges
> > the pages that were not previously charged, causing zswap charging to
> > become inconsistent.
> >
> > This inconsistency triggers two warnings with following steps:
> >   # On a machine with 64GiB of RAM and 36GiB of zswap
> >   $ stress-ng --bigheap 2 # wait until the OOM-killer kills stress-ng
> >   $ sudo reboot
> >
> >   Two warnings are:
> >     in mm/memcontrol.c:163, function obj_cgroup_release():
> >       WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
> >
> >     in mm/page_counter.c:60, function page_counter_cancel():
> >       if (WARN_ONCE(new < 0, "page_counter underflow: %ld
> nr_pages=%lu\n",
> > 	  new, nr_pages))
> >
> > While objcg events should only be accounted for when the entire folio is
> > zswapped, objcg charging should be performed regardlessly.
> > Fix accordingly.
> >
> > After resolving the inconsistency, these warnings disappear.
> >
> > Fixes: b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
> > Cc: stable@xxxxxxxxxxxxxxx
> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
> > ---
> >
> > v1->v2:
> >
> >  Fixed objcg events being accounted for on zswap failure.
> >
> >  Fixed the incorrect description. I misunderstood that the base pages are
> >  going to be stored in zswap, but their zswap entries are freed immediately.
> >
> >  Added a comment on why it charges pages that are going to be removed
> >  from zswap.
> >
> >  mm/zswap.c | 14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 6504174fbc6a..10b30ac46deb 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -1568,20 +1568,26 @@ bool zswap_store(struct folio *folio)
> >
> >  		bytes = zswap_store_page(page, objcg, pool);
> >  		if (bytes < 0)
> > -			goto put_pool;
> > +			goto charge_zswap;
> >  		compressed_bytes += bytes;
> >  	}
> >
> > -	if (objcg) {
> > -		obj_cgroup_charge_zswap(objcg, compressed_bytes);
> > +	if (objcg)
> >  		count_objcg_events(objcg, ZSWPOUT, nr_pages);
> > -	}
> >
> >  	atomic_long_add(nr_pages, &zswap_stored_pages);
> >  	count_vm_events(ZSWPOUT, nr_pages);
> >
> >  	ret = true;
> >
> > +charge_zswap:
> > +	/*
> > +	 * Charge zswapped pages even when it failed to zswap the entire
> folio,
> > +	 * because zswap_entry_free() will uncharge them anyway.
> > +	 * Otherwise zswap charging will become inconsistent.
> > +	 */
> > +	if (objcg)
> > +		obj_cgroup_charge_zswap(objcg, compressed_bytes);
> 
> Thanks for fixing this!
> 
> Having to charge just to uncharge right after is annoying. Ideally we'd
> just clear entry->objcg if we fail before charging, but we don't have a
> direct reference to the entries here and another tree lookup is not
> ideal either.
> 
> I guess we may be able to improve this handling once [1] lands, as we
> can move the charging logic into zswap_store_folio() where we'd have
> access to the entries.

Thanks Yosry. I agree, we can improve this handling in [1]. I will add this
to my list.

> 
> For now, would the control flow be easier if we move the charge ahead of
> the zswap_store_page() loop instead? There is an existing if (objcg)
> block there as well.

I just replied with a suggestion to move the objcg charging and incrementing
zswap_stored_pages to be per successful xarray store, within zswap_store_page()
itself. Please let me know if this would be a good solution for the hotfix.

Thanks,
Kanchana

> 
> [1]https://lore.kernel.org/linux-mm/20241221063119.29140-12-
> kanchana.p.sridhar@xxxxxxxxx/
> 
> >  put_pool:
> >  	zswap_pool_put(pool);
> >  put_objcg:
> > --
> > 2.47.1
> >
> >





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux