Re: [syzbot] [mm?] WARNING in zswap_swapoff

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Tue, 3 Sep 2024 11:21:56 -0700

[..]
> > > > > > I am not closely following the latest changes so I am not sure. CCing
> > > > > > folks who have done work in that area recently.
> > > > > >
> > > > > > I am starting to think maybe it would be more reliable to just call
> > > > > > zswap_invalidate() for all freed swap entries anyway. Would that be
> > > > > > too expensive? We used to do that before the zswap_invalidate() call
> > > > > > was moved by commit 0827a1fb143f ("mm/zswap: invalidate zswap entry
> > > > > > when swap entry free"), and that was before we started using the
> > > > > > xarray (so it was arguably worse than it would be now).
> > > > > >
> > > > >
> > > > > That might be a good idea, I suggest moving zswap_invalidate to
> > > > > swap_range_free and call it for every freed slot.
> > > > >
> > > > > Below patch can be squash into or put before "mm: attempt to batch
> > > > > free swap entries for zap_pte_range()".
> > > >
> > > > Hmm, on second thought, the commit message in the attachment commit
> > > > might be not suitable, current zswap_invalidate is also designed to
> > > > only work for order 0 ZSWAP, so things are not clean even after this.
> > >
> > > Kairui, what about the below? we don't touch the path of __try_to_reclaim_swap() where
> > > you have one folio backed?
> > >
> > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > index c1638a009113..8ff58be40544 100644
> > > --- a/mm/swapfile.c
> > > +++ b/mm/swapfile.c
> > > @@ -1514,6 +1514,8 @@ static bool __swap_entries_free(struct swap_info_struct *si,
> > >         unlock_cluster_or_swap_info(si, ci);
> > >
> > >         if (!has_cache) {
> > > +               for (i = 0; i < nr; i++)
> > > +                       zswap_invalidate(swp_entry(si->type, offset + i));
> > >                 spin_lock(&si->lock);
> > >                 swap_entry_range_free(si, entry, nr);
> > >                 spin_unlock(&si->lock);
> > >
> >
> > Hi Barry,
> >
> > Thanks for updating this thread, I'm thinking maybe something will
> > better be done at the zswap side?
> >
> > The concern of using zswap_invalidate is that it calls xa_erase which
> > requires the xa spin lock. But if we are calling zswap_invalidate in
> > swap_entry_range_free, and ensure the slot is HAS_CACHE pinned, doing
> > a lockless read first with xa_load should be OK for checking if the
> > slot needs a ZSWAP invalidation. The performance cost will be minimal
> > and we only need to call zswap_invalidate in one place, something like
> > this (haven't tested, comments are welcome). Also ZSWAP mthp will
> > still store entried in order 0 so this should be OK for future.
>
>
> While I do agree with this change on a high level, it's essentially
> reverting commit 0827a1fb143f ("mm/zswap: invalidate zswap entry when
> swap entry free") which fixed a small problem with zswap writeback.
> I'd prefer that we don't if possible.
>
> One thing that I always wanted to do is to pull some of the work done
> in swap_entry_range_free() and swap_range_free() before the slots
> caching layer. The memcg uncharging, clearing shadow entries from the
> swap cache, arch invalidation, zswap invalidation, etc. If we can have
> a hook for these pre-free callbacks we can call it for single entries
> before we add them to the slots cache, and call them for the clusters
> as we do today. This should also reduce the amount of work done under
> the lock, and move more work to where the freeing is actually
> happening vs. the cache draining.
>
> I remember discussing this briefly with Ying before. Anyone have any thoughts?

Kairui, Barry, any thoughts on this? Any preferences on how to make
sure zswap_invalidate() is being called in all possible swap freeing
paths?