> -----Original Message----- > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Sent: Tuesday, September 24, 2024 2:34 PM > To: Nhat Pham <nphamcs@xxxxxxxxx> > Cc: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>; linux- > kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; hannes@xxxxxxxxxxx; > chengming.zhou@xxxxxxxxx; usamaarif642@xxxxxxxxx; > shakeel.butt@xxxxxxxxx; ryan.roberts@xxxxxxx; Huang, Ying > <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; > Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx> > Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in > zswap_store(). > > On Tue, Sep 24, 2024 at 2:08 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote: > > > > On Tue, Sep 24, 2024 at 1:51 PM Sridhar, Kanchana P > > <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > > > > > > This is an excellent point. Thanks Nhat for catching this! I can see two > > > options to solving this: > > > > > > Option 1: If zswap_mthp_enabled is "false", delete all stored offsets > > > for the mTHP in zswap before exiting. This could race with writeback > > > (either one or more subpages could be written back before zswap_store > > > acquires the tree lock), however, I don't think it will cause data > inconsistencies. > > > Any offsets for subpages not written back will be deleted from zswap, > > > zswap_store() will return false, and the backing swap device's subsequent > > > swapout will over-write the zswap write-back data. Could anything go > wrong > > > with this? > > > > I think this should be safe, albeit a bit awkward. > > > > At this point (zswap_store()), we should have the folio added to to > > swap cache, and locked. All the associated swap entries will point to > > this same (large) folio. > > > > Any concurrent zswap writeback attempt, even on a tail page, should > > get that folio when it calls __read_swap_cache_async(), and with > > page_allocated == false, and should short circuit. > > > > So I don't think we will race with zswap_writeback(). > > > > Yosry, Chengming, Johannes, any thoughts? > > Why can't we just handle it the same way as we handle zswap > disablement? If it is disabled, we invalidate any old entries for the > offsets and return false to swapout to disk. > > Taking a step back, why do we need the runtime knob and config option? > Are there cases where we think zswapout of mTHPs will perform badly, > or is it just due to lack of confidence in the feature? Thanks Nhat and Yosry for the suggestions/comments. If I recall correctly, the topic of adding a config option/knob came up based on earlier data I had collected with a zram backing device setup, which showed a performance degradation with zstd, but not with deflate-iaa. Since the v7 data collected with an 823G SSD swap disk partition indicates that we get good throughput and latency improvements with zswap-mTHP with zstd and deflate-iaa, I am not sure if the knob is still required (if this is representative of most of the setups that use mTHP). I am confident about the zswap-mTHP feature itself, and don’t think the knob is needed from that perspective. I think the question is really about having the ability to disable zswap-mTHP in some existing setup where having mTHP enabled performs worse with this patchset than without. I am Ok with having the knob and handling it using Option 1, or, not having a knob. Thanks, Kanchana > > > > > > > > > Option 2: Only provide a build config option, > > > CONFIG_ZSWAP_STORE_THP_DEFAULT_ON, that cannot be dynamically > changed. > > > > This can be a last resort thing, if the above doesn't work. Not the > > end of the world, but not ideal :) > > > > > > > > Would appreciate suggestions on these, and other potential solutions. > > > > > > Thanks, > > > Kanchana