On Tue, Sep 24, 2024 at 1:51 PM Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > This is an excellent point. Thanks Nhat for catching this! I can see two > options to solving this: > > Option 1: If zswap_mthp_enabled is "false", delete all stored offsets > for the mTHP in zswap before exiting. This could race with writeback > (either one or more subpages could be written back before zswap_store > acquires the tree lock), however, I don't think it will cause data inconsistencies. > Any offsets for subpages not written back will be deleted from zswap, > zswap_store() will return false, and the backing swap device's subsequent > swapout will over-write the zswap write-back data. Could anything go wrong > with this? I think this should be safe, albeit a bit awkward. At this point (zswap_store()), we should have the folio added to to swap cache, and locked. All the associated swap entries will point to this same (large) folio. Any concurrent zswap writeback attempt, even on a tail page, should get that folio when it calls __read_swap_cache_async(), and with page_allocated == false, and should short circuit. So I don't think we will race with zswap_writeback(). Yosry, Chengming, Johannes, any thoughts? > > Option 2: Only provide a build config option, > CONFIG_ZSWAP_STORE_THP_DEFAULT_ON, that cannot be dynamically changed. This can be a last resort thing, if the above doesn't work. Not the end of the world, but not ideal :) > > Would appreciate suggestions on these, and other potential solutions. > > Thanks, > Kanchana