> -----Original Message----- > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Sent: Wednesday, September 25, 2024 9:52 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; > linux-mm@xxxxxxxxx; nphamcs@xxxxxxxxx; chengming.zhou@xxxxxxxxx; > usamaarif642@xxxxxxxxx; shakeel.butt@xxxxxxxxx; ryan.roberts@xxxxxxx; > Huang, Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux- > foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx> > Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in > zswap_store(). > > [..] > > > > One thing I realized while reworking the patches for the batched checks is: > > within zswap_store_page(), we set the entry->objcg and entry->pool before > > adding it to the xarray. Given this, wouldn't it be safer to get the objcg > > and pool reference per sub-page, locally in zswap_store_page(), rather than > > obtaining batched references at the end if the store is successful? If we > want > > zswap_store_page() to be self-contained and correct as far as the entry > > being created and added to the xarray, it seems like the right thing to do? > > I am a bit apprehensive about the entry being added to the xarray without > > a reference obtained on the objcg and pool, because any page- > faults/writeback > > that occur on sub-pages added to the xarray before the entire folio has been > > stored, would run into issues. > > We definitely should not obtain references to the pool and objcg after > initializing the entries with them. We can obtain all references in > zswap_store() before zswap_store_page(). IOW, the batching in this > case should be done before the per-page operations, not after. Thanks Yosry. IIUC, we should obtain all references to the objcg and to the zswap_pool at the start of zswap_store. In the case of error on any sub-page, we will unwind state for potentially only the stored pages or the entire folio if it happened to already be in zswap and is being re-written. We might need some additional book-keeping to keep track of which sub-pages were found in the xarray and zswap_entry_free() got called (nr_sb). Assuming I define a new "obj_cgroup_put_many()", I would need to call this with (folio_nr_pages() - nr_sb). As far as zswap_pool_get(), there is some added complexity if we want to keep the existing implementation that calls "percpu_ref_tryget()", and assuming this is extended to provide a new "zswap_pool_get_many()" that calls "percpu_ref_tryget_many()". Is there a reason we use percpu_ref_tryget() instead of percpu_ref_get()? Reason I ask is, with tryget(), if for some reason the pool->ref is 0, no further increments will be made. If so, upon unwinding state in zswap_store(), I would need to special-case to catch this before calling a new "zswap_pool_put_many()". Things could be a little simpler if zswap_pool_get() can use "percpu_ref_get()" which will always increment the refcount. Since the zswap pool->ref is initialized to "1", this seems Ok, but I don't know if there will be unintended consequences. Can you please advise on what is the simplest/cleanest approach: 1) Proceed with the above changes without changing percpu_ref_tryget in zswap_pool_get. Needs special-casing in zswap_store to detect pool->ref being "0" before calling zswap_pool_put[_many]. 2) Modify zswap_pool_get/zswap_pool_get_many to use percpu_ref_get_many and avoid special-casing to detect pool->ref being "0" before calling zswap_pool_put[_many]. 3) Keep the approach in v7 where obj_cgroup_get/put is localized to zswap_store_page for both success and error conditions, and any unwinding state in zswap_store will take care of dropping references obtained from prior successful writes (from this or prior invocations of zswap_store). Thanks, Kanchana > > > > > Just wanted to run this by you. The rest of the batched charging, atomic > > and stat updates should be Ok. > > > > Thanks, > > Kanchana > > > > > > > > Thanks, > > > Kanchana