RE: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in zswap_store().

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
> Sent: Thursday, September 26, 2024 10:20 AM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx;
> linux-mm@xxxxxxxxx; nphamcs@xxxxxxxxx; chengming.zhou@xxxxxxxxx;
> usamaarif642@xxxxxxxxx; shakeel.butt@xxxxxxxxx; ryan.roberts@xxxxxxx;
> Huang, Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux-
> foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K
> <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>
> Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in
> zswap_store().
> 
> On Thu, Sep 26, 2024 at 9:40 AM Sridhar, Kanchana P
> <kanchana.p.sridhar@xxxxxxxxx> wrote:
> >
> > > -----Original Message-----
> > > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
> > > Sent: Wednesday, September 25, 2024 9:52 PM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>
> > > Cc: Johannes Weiner <hannes@xxxxxxxxxxx>; linux-
> kernel@xxxxxxxxxxxxxxx;
> > > linux-mm@xxxxxxxxx; nphamcs@xxxxxxxxx; chengming.zhou@xxxxxxxxx;
> > > usamaarif642@xxxxxxxxx; shakeel.butt@xxxxxxxxx;
> ryan.roberts@xxxxxxx;
> > > Huang, Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux-
> > > foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K
> > > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>
> > > Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in
> > > zswap_store().
> > >
> > > [..]
> > > >
> > > > One thing I realized while reworking the patches for the batched checks
> is:
> > > > within zswap_store_page(), we set the entry->objcg and entry->pool
> before
> > > > adding it to the xarray. Given this, wouldn't it be safer to get the objcg
> > > > and pool reference per sub-page, locally in zswap_store_page(), rather
> than
> > > > obtaining batched references at the end if the store is successful? If we
> > > want
> > > > zswap_store_page() to be self-contained and correct as far as the entry
> > > > being created and added to the xarray, it seems like the right thing to
> do?
> > > > I am a bit apprehensive about the entry being added to the xarray
> without
> > > > a reference obtained on the objcg and pool, because any page-
> > > faults/writeback
> > > > that occur on sub-pages added to the xarray before the entire folio has
> been
> > > > stored, would run into issues.
> > >
> > > We definitely should not obtain references to the pool and objcg after
> > > initializing the entries with them. We can obtain all references in
> > > zswap_store() before zswap_store_page(). IOW, the batching in this
> > > case should be done before the per-page operations, not after.
> >
> > Thanks Yosry. IIUC, we should obtain all references to the objcg and to the
> > zswap_pool at the start of zswap_store.
> >
> > In the case of error on any sub-page, we will unwind state for potentially
> > only the stored pages or the entire folio if it happened to already be in
> zswap
> > and is being re-written. We might need some additional book-keeping to
> > keep track of which sub-pages were found in the xarray and
> zswap_entry_free()
> > got called (nr_sb). Assuming I define a new "obj_cgroup_put_many()", I
> would need
> > to call this with (folio_nr_pages() - nr_sb).
> >
> > As far as zswap_pool_get(), there is some added complexity if we want to
> > keep the existing implementation that calls "percpu_ref_tryget()", and
> assuming
> > this is extended to provide a new "zswap_pool_get_many()" that calls
> > "percpu_ref_tryget_many()". Is there a reason we use percpu_ref_tryget()
> instead
> > of percpu_ref_get()? Reason I ask is, with tryget(), if for some reason the
> pool->ref
> > is 0, no further increments will be made. If so, upon unwinding state in
> > zswap_store(), I would need to special-case to catch this before calling a
> new
> > "zswap_pool_put_many()".
> >
> > Things could be a little simpler if zswap_pool_get() can use
> "percpu_ref_get()"
> > which will always increment the refcount. Since the zswap pool->ref is
> initialized
> > to "1", this seems Ok, but I don't know if there will be unintended
> consequences.
> >
> > Can you please advise on what is the simplest/cleanest approach:
> >
> > 1) Proceed with the above changes without changing percpu_ref_tryget in
> >      zswap_pool_get. Needs special-casing in zswap_store to detect pool-
> >ref
> >     being "0" before calling zswap_pool_put[_many].
> 
> My assumption is that we can reorder the code such that if
> zswap_pool_get_many() fails we don't call zswap_pool_put_many() to
> begin with (e.g. jump to a label after zswap_pool_put_many()).

However, the pool refcount could change between the start and end of
zswap_store.

> 
> > 2) Modify zswap_pool_get/zswap_pool_get_many to use
> percpu_ref_get_many
> >     and avoid special-casing to detect pool->ref being "0" before calling
> >     zswap_pool_put[_many].
> 
> I don't think we can simply switch the tryget to a get, as I believe
> we can race with the pool being destroyed.

That was my initial thought as well, but I figured this couldn't happen
since the pool->ref is initialized to "1", and based on the existing
implementation. In any case, I can understand the intent of the use
of "tryget"; it is just that it adds to the considerations for reference
batching.

> 
> > 3) Keep the approach in v7 where obj_cgroup_get/put is localized to
> >     zswap_store_page for both success and error conditions, and any
> unwinding
> >     state in zswap_store will take care of dropping references obtained from
> >     prior successful writes (from this or prior invocations of zswap_store).
> 
> I am also fine with doing that and doing the reference batching as a follow up.

I think so too! We could try and improve upon (3) with reference batching
in a follow-up patch.

Thanks,
Kanchana

> 
> 
> >
> > Thanks,
> > Kanchana
> >
> > >
> > > >
> > > > Just wanted to run this by you. The rest of the batched charging, atomic
> > > > and stat updates should be Ok.
> > > >
> > > > Thanks,
> > > > Kanchana
> > > >
> > > > >
> > > > > Thanks,
> > > > > Kanchana




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux