[PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Darrick J. Wong <djwong@xxxxxxxxxx>

When I added an fstests config for "RAID" striping (aka MKFS_OPTIONS='-d
su=128k,sw=4'), I suddenly started seeing this test fail sporadically
with:

--- /tmp/fstests/tests/xfs/178.out	2023-07-11 12:18:21.714970364 -0700
+++ /var/tmp/fstests/xfs/178.out.bad	2023-07-25 22:05:39.756000000 -0700
@@ -10,6 +10,20 @@ bad primary superblock - bad magic numbe

 attempting to find secondary superblock...
 found candidate secondary superblock...
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
 verified secondary superblock...
 writing modified primary superblock
 sb root inode INO inconsistent with calculated value INO

Eventually I tracked this down to a mis-interaction between the test,
xfs_repair, and the storage device.

The storage advertises SCSI UNMAP support, but it is of the variety
where the UNMAP command returns immediately but takes its time to unmap
in the background.  Subsequent rereads are allowed to return stale
contents, per DISCARD semantics.

When the fstests cloud is not busy, the old contents disappear in a few
seconds.  However, at peak utilization, there are ~75 VMs running, and
the storage backend can take several minutes to commit these background
requests.

When we zero the primary super and start xfs_repair on SCRATCH_DEV, it
will walk the device looking for secondary supers.  Most of the time it
finds the actual AG 1 secondary super, but sometimes it finds ghosts
from previous formats.  When that happens, xfs_repair will talk quite a
bit about those failed secondaries, even if it eventually finds an
acceptable secondary sb and completes the repair.

Filter out the messages about secondary supers.

Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
---
 tests/xfs/178     |    9 ++++++++-
 tests/xfs/178.out |    2 --
 2 files changed, 8 insertions(+), 3 deletions(-)


diff --git a/tests/xfs/178 b/tests/xfs/178
index a65197cde3..fee1e92bf3 100755
--- a/tests/xfs/178
+++ b/tests/xfs/178
@@ -10,13 +10,20 @@
 . ./common/preamble
 _begin_fstest mkfs other auto
 
+filter_repair() {
+	_filter_repair | sed \
+		-e '/unable to verify superblock, continuing/d' \
+		-e '/found candidate secondary superblock/d' \
+		-e '/error reading superblock.*-- seek to offset/d'
+}
+
 # dd the 1st sector then repair
 _dd_repair_check()
 {
 	#dd first sector
 	dd if=/dev/zero of=$1 bs=$2 count=1 2>&1 | _filter_dd
 	#xfs_repair
-	_scratch_xfs_repair 2>&1 | _filter_repair
+	_scratch_xfs_repair 2>&1 | filter_repair
 	#check repair
 	if _check_scratch_fs; then
         	echo "repair passed"
diff --git a/tests/xfs/178.out b/tests/xfs/178.out
index 0bebe553eb..711e90cc26 100644
--- a/tests/xfs/178.out
+++ b/tests/xfs/178.out
@@ -9,7 +9,6 @@ Phase 1 - find and verify superblock...
 bad primary superblock - bad magic number !!!
 
 attempting to find secondary superblock...
-found candidate secondary superblock...
 verified secondary superblock...
 writing modified primary superblock
 sb root inode INO inconsistent with calculated value INO
@@ -45,7 +44,6 @@ Phase 1 - find and verify superblock...
 bad primary superblock - bad magic number !!!
 
 attempting to find secondary superblock...
-found candidate secondary superblock...
 verified secondary superblock...
 writing modified primary superblock
 sb root inode INO inconsistent with calculated value INO




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux