On Mon, Mar 25, 2024 at 10:23 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > On Sun, Mar 24, 2024 at 2:04 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > > > Zhongkun He reports data corruption when combining zswap with zram. > > > > The issue is the exclusive loads we're doing in zswap. They assume > > that all reads are going into the swapcache, which can assume > > authoritative ownership of the data and so the zswap copy can go. > > > > However, zram files are marked SWP_SYNCHRONOUS_IO, and faults will try > > to bypass the swapcache. This results in an optimistic read of the > > swap data into a page that will be dismissed if the fault fails due to > > races. In this case, zswap mustn't drop its authoritative copy. > > > > Link: https://lore.kernel.org/all/CACSyD1N+dUvsu8=zV9P691B9bVq33erwOXNTmEaUbi9DrDeJzw@xxxxxxxxxxxxxx/ > > Reported-by: Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx> > > Fixes: b9c91c43412f ("mm: zswap: support exclusive loads") > > Cc: stable@xxxxxxxxxxxxxxx [6.5+] > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> > > Tested-by: Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx> Acked-by: Barry Song <baohua@xxxxxxxxxx> > > Do we also want to mention somewhere (commit log or comment) that > keeping the entry in the tree is fine because we are still protected > from concurrent loads/invalidations/writeback by swapcache_prepare() > setting SWAP_HAS_CACHE or so? It seems that Kairui's patch comprehensively addresses the issue at hand. Johannes's solution, on the other hand, appears to align zswap behavior more closely with that of a traditional swap device, only releasing an entry when the corresponding swap slot is freed, particularly in the sync-io case. Johannes' patch has inspired me to consider whether zRAM could achieve a comparable outcome by immediately releasing objects in swap cache scenarios. When I have the opportunity, I plan to experiment with zRAM. > > Anyway, this LGTM. > > Acked-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > > --- > > mm/zswap.c | 23 +++++++++++++++++++---- > > 1 file changed, 19 insertions(+), 4 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index 535c907345e0..41a1170f7cfe 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -1622,6 +1622,7 @@ bool zswap_load(struct folio *folio) > > swp_entry_t swp = folio->swap; > > pgoff_t offset = swp_offset(swp); > > struct page *page = &folio->page; > > + bool swapcache = folio_test_swapcache(folio); > > struct zswap_tree *tree = swap_zswap_tree(swp); > > struct zswap_entry *entry; > > u8 *dst; > > @@ -1634,7 +1635,20 @@ bool zswap_load(struct folio *folio) > > spin_unlock(&tree->lock); > > return false; > > } > > - zswap_rb_erase(&tree->rbroot, entry); > > + /* > > + * When reading into the swapcache, invalidate our entry. The > > + * swapcache can be the authoritative owner of the page and > > + * its mappings, and the pressure that results from having two > > + * in-memory copies outweighs any benefits of caching the > > + * compression work. > > + * > > + * (Most swapins go through the swapcache. The notable > > + * exception is the singleton fault on SWP_SYNCHRONOUS_IO > > + * files, which reads into a private page and may free it if > > + * the fault fails. We remain the primary owner of the entry.) > > + */ > > + if (swapcache) > > + zswap_rb_erase(&tree->rbroot, entry); > > spin_unlock(&tree->lock); > > > > if (entry->length) > > @@ -1649,9 +1663,10 @@ bool zswap_load(struct folio *folio) > > if (entry->objcg) > > count_objcg_event(entry->objcg, ZSWPIN); > > > > - zswap_entry_free(entry); > > - > > - folio_mark_dirty(folio); > > + if (swapcache) { > > + zswap_entry_free(entry); > > + folio_mark_dirty(folio); > > + } > > > > return true; > > } > > -- > > 2.44.0 Thanks Barry