Re: xfs clones crash issue - illegal state 13 in block map

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Thu, 7 Sep 2017 09:13:30 -0700

On Thu, Sep 07, 2017 at 03:58:56PM +0300, Amir Goldstein wrote:
> Hi guys,
> 
> I am getting these errors often when running the crash tests
> with cloned files (generic/502 in my xfstests patches).
> 
> Hitting these errors requires first fixing 2 other issues
> that shadow over this issue:
> "xfs: fix incorrect log_flushed on fsync" (in master)
> "xfs: fix leftover CoW extent after truncate"
> available on my tree based on Darrick's simple fix:
> https://github.com/amir73il/linux/commits/xfs-fsync
> 
> I get the errors more often (1 out of 5) on a 100G fs on spinning disk.
> On a 10G fs on SSD they are less frequent.
> The log in this email was captured on patched stable 4.9.47 kernel,
> but I am getting the same errors on patched upstream kernel.
> 
> I wasn't able to create a deterministic reproducer, so attaching
> the full log from a failed test along with an IO log that can be
> replayed on your disk to examine the outcome.
> 
> Following is the output of fsx process #5, which is the process
> that wrote the problematic testfile5.mark0 to the log.
> This process performs only read,zero,fsync before creating
> the log mark.
> The file testfile5 was cloned from an origin 256K file before
> running fsx.
> Later, I used the random seed 35484 in this log for all
> processes and it seemed to increase the probability for failure.
> 
> # /old/home/amir/src/xfstests-dev/ltp/fsx -N 100 -d -k -P
> /mnt/test/fsxtests -i /dev/mapper/logwrites-test -S 0 -j 5
> /mnt/scratch/testfile5
> Seed set to 35484
> file_size=262144
> 5: 1 read 0x3f959 thru 0x3ffff (0x6a7 bytes)
> 5: 2 zero from 0x3307e to 0x34f74, (0x1ef6 bytes)
> 5: 3 fsync
> 5: Dumped fsync buffer to testfile5.mark0
> 
> In order to get to the crash state you need to get my
> xfstests replay-log patches and replay the attached log
> on a >= 100G scratch device:
> 
> # ./src/log-writes/replay-log --log log.xfs.testfile5.mark0 --replay
> $SCRATCH_DEV --end-mark testfile5.mark0
> # mount $SCRATCH_DEV $SCRATCH_MNT
> # umount $SCRATCH_MNT
> # xfs_repair -n $SCRATCH_DEV
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan (but don't clear) agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
> 
> fatal error -- illegal state 13 in block map 376
> 
> Can anyone provide some insight?

Looks like I missed a couple of extent states in process_bmbt_reclist_int.

What happens if you add the following (only compile tested) patch to
xfsprogs?

(Normally I'd say send a metadump too for us mere mortals to work with,
though I'm about to plunge into weddingland so I likely won't be able to
do much until the 18th.)

((Eric: If this doesn't turn out to be a totally garbage patch, feel
free to add it to xfsprogs.))

--D

xfs_repair: handle missing extent states

Missed a couple of the new extent states in the bmbt processing, so add
them to avoid aborting xfs_repair.

Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
---
 repair/dinode.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/repair/dinode.c b/repair/dinode.c
index f817b5a..b35a523 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -796,6 +796,7 @@ _("%s fork in ino %" PRIu64 " claims free block %" PRIu64 "\n"),
 			case XR_E_FS_MAP:
 			case XR_E_INO:
 			case XR_E_INUSE_FS:
+			case XR_E_REFC:
 				do_warn(
 _("%s fork in inode %" PRIu64 " claims metadata block %" PRIu64 "\n"),
 					forkname, ino, b);
@@ -812,6 +813,12 @@ _("%s fork in %s inode %" PRIu64 " claims used block %" PRIu64 "\n"),
 					forkname, ftype, ino, b);
 				goto done;
 
+			case XR_E_COW:
+				do_warn(
+_("%s fork in %s inode %" PRIu64 " claims CoW block %" PRIu64 "\n"),
+					forkname, ftype, ino, b);
+				goto done;
+
 			default:
 				do_error(
 _("illegal state %d in block map %" PRIu64 "\n"),
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html