[..] > > > I don't think we want to stop doing exclusive loads in zswap due to this > > > interaction with zram, which shouldn't be common. > > > > > > I think we can solve this by just writing the folio back to zswap upon > > > failure as I mentioned. > > > > Instead of storing again, can we avoid invalidating the entry in the > > first place if the load is not "exclusive"? > > > > The reason for exclusive loads is that the ownership is transferred to > > the swapcache, so there is no point in keeping our copy. With an > > optimistic read that doesn't transfer ownership, this doesn't > > apply. And we can easily tell inside zswap_load() if we're dealing > > with a swapcache read or not by testing the folio. > > > > The synchronous read already has to pin the swp_entry_t to be safe, > > using swapcache_prepare(). That blocks __read_swap_cache_async() which > > means no other (exclusive) loads and no invalidates can occur. > > > > The zswap entry is freed during the regular swap_free() path, which > > the sync fault calls on success. Otherwise we keep it. > > I thought about this, but I was particularly worried about the need to > bring back the refcount that was removed when we switched to only > supporting exclusive loads: > https://lore.kernel.org/lkml/20240201-b4-zswap-invalidate-entry-v2-6-99d4084260a0@xxxxxxxxxxxxx/ > > It seems to be that we don't need it, because swap_free() will free > the entry as you mentioned before anyone else has the chance to load > it or invalidate it. Writeback used to grab a reference as well, but > it removes the entry from the tree anyway and takes full ownership of > it then frees it, so that should be okay. > > It makes me nervous though to be honest. For example, not long ago > swap_free() didn't call zswap_invalidate() directly (used to happen to > swap slots cache draining). Without it, a subsequent load could race > with writeback without refcount protection, right? We would need to > make sure to backport 0827a1fb143f ("mm/zswap: invalidate zswap entry > when swap entry free") with the fix to stable for instance. > > I can't find a problem with your diff, but it just makes me nervous to > have non-exclusive loads without a refcount. > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index 535c907345e0..686364a6dd86 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -1622,6 +1622,7 @@ bool zswap_load(struct folio *folio) > > swp_entry_t swp = folio->swap; > > pgoff_t offset = swp_offset(swp); > > struct page *page = &folio->page; > > + bool swapcache = folio_test_swapcache(folio); > > struct zswap_tree *tree = swap_zswap_tree(swp); > > struct zswap_entry *entry; > > u8 *dst; > > @@ -1634,7 +1635,8 @@ bool zswap_load(struct folio *folio) > > spin_unlock(&tree->lock); > > return false; > > } > > - zswap_rb_erase(&tree->rbroot, entry); > > + if (swapcache) > > + zswap_rb_erase(&tree->rbroot, entry); On second thought, if we don't remove the entry from the tree here, writeback could free the entry from under us after we drop the lock here, right? > > spin_unlock(&tree->lock); > > > > if (entry->length) > > @@ -1649,9 +1651,10 @@ bool zswap_load(struct folio *folio) > > if (entry->objcg) > > count_objcg_event(entry->objcg, ZSWPIN); > > > > - zswap_entry_free(entry); > > - > > - folio_mark_dirty(folio); > > + if (swapcache) { > > + zswap_entry_free(entry); > > + folio_mark_dirty(folio); > > + } > > > > return true; > > }