[..] > > > > > > I am not closely following the latest changes so I am not sure. CCing > > > > > > folks who have done work in that area recently. > > > > > > > > > > > > I am starting to think maybe it would be more reliable to just call > > > > > > zswap_invalidate() for all freed swap entries anyway. Would that be > > > > > > too expensive? We used to do that before the zswap_invalidate() call > > > > > > was moved by commit 0827a1fb143f ("mm/zswap: invalidate zswap entry > > > > > > when swap entry free"), and that was before we started using the > > > > > > xarray (so it was arguably worse than it would be now). > > > > > > > > > > > > > > > > That might be a good idea, I suggest moving zswap_invalidate to > > > > > swap_range_free and call it for every freed slot. > > > > > > > > > > Below patch can be squash into or put before "mm: attempt to batch > > > > > free swap entries for zap_pte_range()". > > > > > > > > Hmm, on second thought, the commit message in the attachment commit > > > > might be not suitable, current zswap_invalidate is also designed to > > > > only work for order 0 ZSWAP, so things are not clean even after this. > > > > > > Kairui, what about the below? we don't touch the path of __try_to_reclaim_swap() where > > > you have one folio backed? > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > > index c1638a009113..8ff58be40544 100644 > > > --- a/mm/swapfile.c > > > +++ b/mm/swapfile.c > > > @@ -1514,6 +1514,8 @@ static bool __swap_entries_free(struct swap_info_struct *si, > > > unlock_cluster_or_swap_info(si, ci); > > > > > > if (!has_cache) { > > > + for (i = 0; i < nr; i++) > > > + zswap_invalidate(swp_entry(si->type, offset + i)); > > > spin_lock(&si->lock); > > > swap_entry_range_free(si, entry, nr); > > > spin_unlock(&si->lock); > > > > > > > Hi Barry, > > > > Thanks for updating this thread, I'm thinking maybe something will > > better be done at the zswap side? > > > > The concern of using zswap_invalidate is that it calls xa_erase which > > requires the xa spin lock. But if we are calling zswap_invalidate in > > swap_entry_range_free, and ensure the slot is HAS_CACHE pinned, doing > > a lockless read first with xa_load should be OK for checking if the > > slot needs a ZSWAP invalidation. The performance cost will be minimal > > and we only need to call zswap_invalidate in one place, something like > > this (haven't tested, comments are welcome). Also ZSWAP mthp will > > still store entried in order 0 so this should be OK for future. > > > While I do agree with this change on a high level, it's essentially > reverting commit 0827a1fb143f ("mm/zswap: invalidate zswap entry when > swap entry free") which fixed a small problem with zswap writeback. > I'd prefer that we don't if possible. > > One thing that I always wanted to do is to pull some of the work done > in swap_entry_range_free() and swap_range_free() before the slots > caching layer. The memcg uncharging, clearing shadow entries from the > swap cache, arch invalidation, zswap invalidation, etc. If we can have > a hook for these pre-free callbacks we can call it for single entries > before we add them to the slots cache, and call them for the clusters > as we do today. This should also reduce the amount of work done under > the lock, and move more work to where the freeing is actually > happening vs. the cache draining. > > I remember discussing this briefly with Ying before. Anyone have any thoughts? Kairui, Barry, any thoughts on this? Any preferences on how to make sure zswap_invalidate() is being called in all possible swap freeing paths?