On Tue, Feb 6, 2024 at 10:51 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Tue, Feb 06, 2024 at 10:08:55AM -0800, Nhat Pham wrote: > > When a folio is swapped in, the protection size of the corresponding > > zswap LRU is incremented, so that the zswap shrinker is more > > conservative with its reclaiming action. This field is embedded within > > the struct lruvec, so updating it requires looking up the folio's memcg > > and lruvec. However, currently this lookup can happen after the folio is > > unlocked, for instance if a new folio is allocated, and > > swap_read_folio() unlocks the folio before returning. In this scenario, > > there is no stability guarantee for the binding between a folio and its > > memcg and lruvec: > > > > * A folio's memcg and lruvec can be freed between the lookup and the > > update, leading to a UAF. > > * Folio migration can clear the now-unlocked folio's memcg_data, which > > directs the zswap LRU protection size update towards the root memcg > > instead of the original memcg. This was recently picked up by the > > syzbot thanks to a warning in the inlined folio_lruvec() call. > > > > Move the zswap LRU protection range update above the swap_read_folio() > > call, and only when a new page is allocated, to prevent this. > > > > Reported-by: syzbot+17a611d10af7d18a7092@xxxxxxxxxxxxxxxxxxxxxxxxx > > Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@xxxxxxxxxx/ > > Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") > > Signed-off-by: Nhat Pham <nphamcs@xxxxxxxxx> > > Looks great, thanks for updating it! > > One more thing I just realized: > > > --- > > mm/swap_state.c | 10 ++++++---- > > mm/zswap.c | 1 + > > 2 files changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/mm/swap_state.c b/mm/swap_state.c > > index e671266ad772..7255c01a1e4e 100644 > > --- a/mm/swap_state.c > > +++ b/mm/swap_state.c > > @@ -680,9 +680,10 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, > > /* The page was likely read above, so no need for plugging here */ > > folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, > > &page_allocated, false); > > - if (unlikely(page_allocated)) > > + if (unlikely(page_allocated)) { > > + zswap_folio_swapin(folio); > > swap_read_folio(folio, false, NULL); > > - zswap_folio_swapin(folio); > > + } > > return folio; > > } > > > > @@ -855,9 +856,10 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, > > /* The folio was likely read above, so no need for plugging here */ > > folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx, > > &page_allocated, false); > > - if (unlikely(page_allocated)) > > + if (unlikely(page_allocated)) { > > + zswap_folio_swapin(folio); > > swap_read_folio(folio, false, NULL); > > - zswap_folio_swapin(folio); > > + } > > return folio; > > } > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index 4aea03285532..8c548f73d52e 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -827,6 +827,7 @@ void zswap_folio_swapin(struct folio *folio) > > struct lruvec *lruvec; > > > > if (folio) { > > + VM_WARN_ON_ONCE(!folio_test_locked(folio)); > > lruvec = folio_lruvec(folio); > > atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected); > > } > > The NULL check is now also no longer necessary. > > It used to be called unconditionally, even if > __read_swap_cache_async() failed and returned NULL. > > However, page_allocated == true implies success. That newly allocated > and locked folio is always returned. Ah yeah, I forgot the context of that :) Just sent a fixlet to do away with the check. Thanks for picking that out!