Re: mm/truncate.c:669 VM_BUG_ON_FOLIO() - hit on XFS on different tests

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Sat, 9 Dec 2023 05:52:00 +0000

On Fri, Dec 08, 2023 at 02:39:36PM -0800, Luis Chamberlain wrote:
> Commit aa5b9178c0190 ("mm: invalidation check mapping before folio_contains")
> added on v6.6-rc1 moved the VM_BUG_ON_FOLIO() on invalidate_inode_pages2_range()
> after the truncation check.
> 
> We managed to hit this VM_BUG_ON_FOLIO() a few times on v6.6-rc5 with a slew
> of fstsets tests on kdevops [0] on the following XFS config as defined by
> kdevops XFS's configurations [1] for XFS with the following failure rates
> annotated:
> 
>   * xfs_reflink_4k: F:1/278 - one out of 278 times
>     - generic/451: (trace pasted below after running test over 17 hours)
>   * xfs_nocrc_4k: F:1/1604 - one ou tof 1604 times
>      - generic/451: https://gist.github.com/mcgrof/2c40a14979ceeb7321d2234a525c32a6
> 
> To be clear F:1/1604 means you can run the test in a loop and on test number
> about 1604 you may run into the bug. It would seem Zorro had hit also
> with a 64k directory size (mkfs.xfs -n size=65536) on v5.19-rc2, so prior 
> to Hugh's move of the VM_BUG_ON_FOLIO() while testing generic/132 [0].
> 
> My hope is that this could help those interested in reproducing, to
> spawn up kdevops and just run the test in a loop in the same way.
> Likewise, if you have a fix to test we can test it as well, but it will
> take a while as we want to run the test in a loop over and over many
> times.

I'm pretty sure this is the same problem recently diagnosed by Charan.
It's terribly rare, so it'll take a while to find out.  Try the attached
patch?
>From 4bd18e281a5e99f3cc55a9c9cc78cbace4e9a504 Mon Sep 17 00:00:00 2001
From: Charan Teja Kalla <quic_charante@xxxxxxxxxxx>
Date: Sat, 9 Dec 2023 00:39:26 -0500
Subject: [PATCH] mm: Migrate high-order folios in swap cache correctly

Large folios occupy N consecutive entries in the swap cache
instead of using multi-index entries like the page cache.
However, if a large folio is re-added to the LRU list, it can
be migrated.  The migration code was not aware of the difference
between the swap cache and the page cache and assumed that a single
xas_store() would be sufficient.

This leaves potentially many stale pointers to the now-migrated folio
in the swap cache, which can lead to almost arbitrary data corruption
in the future.  This can also manifest as infinite loops with the
RCU read lock held.

Signed-off-by: Charan Teja Kalla <quic_charante@xxxxxxxxxxx>
[modifications to the changelog & tweaked the fix]
Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
---
 mm/migrate.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index d9d2b9432e81..2d67ca47d2e2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -405,6 +405,7 @@ int folio_migrate_mapping(struct address_space *mapping,
 	int dirty;
 	int expected_count = folio_expected_refs(mapping, folio) + extra_count;
 	long nr = folio_nr_pages(folio);
+	long entries, i;
 
 	if (!mapping) {
 		/* Anonymous page without mapping */
@@ -442,8 +443,10 @@ int folio_migrate_mapping(struct address_space *mapping,
 			folio_set_swapcache(newfolio);
 			newfolio->private = folio_get_private(folio);
 		}
+		entries = nr;
 	} else {
 		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
+		entries = 1;
 	}
 
 	/* Move dirty while page refs frozen and newpage not yet exposed */
@@ -453,7 +456,11 @@ int folio_migrate_mapping(struct address_space *mapping,
 		folio_set_dirty(newfolio);
 	}
 
-	xas_store(&xas, newfolio);
+	/* Swap cache still stores N entries instead of a high-order entry */
+	for (i = 0; i < entries; i++) {
+		xas_store(&xas, newfolio);
+		xas_next(&xas);
+	}
 
 	/*
 	 * Drop cache reference from old page by unfreezing
-- 
2.42.0