> -----Original Message----- > From: Nhat Pham <nphamcs@xxxxxxxxx> > Sent: Monday, September 30, 2024 4:43 PM > To: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Cc: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>; linux- > kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; hannes@xxxxxxxxxxx; > chengming.zhou@xxxxxxxxx; usamaarif642@xxxxxxxxx; > shakeel.butt@xxxxxxxxx; ryan.roberts@xxxxxxx; Huang, Ying > <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; > willy@xxxxxxxxxxxxx; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx> > Subject: Re: [PATCH v9 6/7] mm: zswap: Support large folios in zswap_store(). > > On Mon, Sep 30, 2024 at 4:34 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> > wrote: > > > > On Mon, Sep 30, 2024 at 4:29 PM Nhat Pham <nphamcs@xxxxxxxxx> > wrote: > > > > > > On Mon, Sep 30, 2024 at 4:20 PM Yosry Ahmed > <yosryahmed@xxxxxxxxxx> wrote: > > > > > > > > On Mon, Sep 30, 2024 at 4:11 PM Nhat Pham <nphamcs@xxxxxxxxx> > wrote: > > > > > > > > I suggested this in a previous version, and Kanchana faced some > > > > complexities implementing it: > > > > > https://lore.kernel.org/lkml/SJ0PR11MB56785027ED6FCF673A84CEE6C96A2 > @SJ0PR11MB5678.namprd11.prod.outlook.com/ > > > > > > Sorry, I missed that conversation. > > > > > > > > > > > Basically, if we batch get the refs after the store I think it's not > > > > safe, because once an entry is published to writeback it can be > > > > written back and freed, and a ref that we never acquired would be > > > > dropped. > > > > > > Hmmm. I don't think writeback could touch any individual subpage just > yet, no? > > > > > > Before doing any work, zswap writeback would attempt to add the > > > subpage to the swap cache (via __read_swap_cache_async()). However, > > > all subpage will have already been added to swap cache, and point to > > > the (large) folio. So zswap_writeback_entry() should short circuit > > > here (the if (!page_allocated) case). > > > > If it's safe to take the refs after all calls to zswap_store_page() > > are successful, then yeah that should be possible, for both the pool > > and objcg. I didn't look closely though. > > > > Just to clarify, you mean grab one ref first, then do the > > compressions, then grab the remaining refs, right? > > Ah yeah, that's what I meant. We can either perform one of the > following sequences: grab 1 -> grab nr -> drop 1, or grab 1 -> grab nr > - 1 if successful, drop 1 if failed. > > Seems straightforward to me, but yeah it seems a bit hair-splitting of > me to die on this hill :) Just thought it was weird seeing the other > parts batchified, and one part wasn't. > > The rest LGTM - I'll defer to you and Johannes for further review. > > Reviewed-by: Nhat Pham <nphamcs@xxxxxxxxx> Thanks Nhat! Sure, this sounds good. I thought I will clarify my current thinking on this. As Yosry was also mentioning, batching of the pool refs requires some more thought. This is tied in to the design approach of obtaining the objcg/pool refs per page before adding them to the entry, which in turn needs to happen before adding the entry to the xarray. I think there is some value in doing this incrementally because at any point, if storing the page fails: 1) All existing folio entries in the xarray will have valid pool/objcg that can get refs dropped when we unwind state. 2) There are no excess refs that were potentially obtained by batching that need to be dropped (this might make the code a little bit messy, imho). Thanks, Kanchana