On Thu, Sep 7, 2017 at 7:13 PM, Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote: > On Thu, Sep 07, 2017 at 03:58:56PM +0300, Amir Goldstein wrote: >> Hi guys, >> >> I am getting these errors often when running the crash tests >> with cloned files (generic/502 in my xfstests patches). >> >> Hitting these errors requires first fixing 2 other issues >> that shadow over this issue: >> "xfs: fix incorrect log_flushed on fsync" (in master) >> "xfs: fix leftover CoW extent after truncate" >> available on my tree based on Darrick's simple fix: >> https://github.com/amir73il/linux/commits/xfs-fsync >> >> I get the errors more often (1 out of 5) on a 100G fs on spinning disk. >> On a 10G fs on SSD they are less frequent. >> The log in this email was captured on patched stable 4.9.47 kernel, >> but I am getting the same errors on patched upstream kernel. >> >> I wasn't able to create a deterministic reproducer, so attaching >> the full log from a failed test along with an IO log that can be >> replayed on your disk to examine the outcome. >> >> Following is the output of fsx process #5, which is the process >> that wrote the problematic testfile5.mark0 to the log. >> This process performs only read,zero,fsync before creating >> the log mark. >> The file testfile5 was cloned from an origin 256K file before >> running fsx. >> Later, I used the random seed 35484 in this log for all >> processes and it seemed to increase the probability for failure. >> >> # /old/home/amir/src/xfstests-dev/ltp/fsx -N 100 -d -k -P >> /mnt/test/fsxtests -i /dev/mapper/logwrites-test -S 0 -j 5 >> /mnt/scratch/testfile5 >> Seed set to 35484 >> file_size=262144 >> 5: 1 read 0x3f959 thru 0x3ffff (0x6a7 bytes) >> 5: 2 zero from 0x3307e to 0x34f74, (0x1ef6 bytes) >> 5: 3 fsync >> 5: Dumped fsync buffer to testfile5.mark0 >> >> In order to get to the crash state you need to get my >> xfstests replay-log patches and replay the attached log >> on a >= 100G scratch device: >> >> # ./src/log-writes/replay-log --log log.xfs.testfile5.mark0 --replay >> $SCRATCH_DEV --end-mark testfile5.mark0 >> # mount $SCRATCH_DEV $SCRATCH_MNT >> # umount $SCRATCH_MNT >> # xfs_repair -n $SCRATCH_DEV >> Phase 1 - find and verify superblock... >> Phase 2 - using internal log >> - zero log... >> - scan filesystem freespace and inode maps... >> - found root inode chunk >> Phase 3 - for each AG... >> - scan (but don't clear) agi unlinked lists... >> - process known inodes and perform inode discovery... >> - agno = 0 >> >> fatal error -- illegal state 13 in block map 376 >> >> Can anyone provide some insight? > > Looks like I missed a couple of extent states in process_bmbt_reclist_int. > > What happens if you add the following (only compile tested) patch to > xfsprogs? This is what happens: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 data fork in regular inode 134 claims CoW block 376 correcting nextents for inode 134 bad data fork in inode 134 would have cleared inode 134 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... unknown block state, ag 0, block 376 unknown block state, ag 1, block 16 - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 entry "testfile2" in shortform directory 128 references free inode 134 - agno = 3 would have junked entry "testfile2" in directory inode 128 imap claims in-use inode 134 is free, would correct imap Missing reverse-mapping record for (0/376) len 1 owner 134 off 19 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. > > (Normally I'd say send a metadump too for us mere mortals to work with, > though I'm about to plunge into weddingland so I likely won't be able to > do much until the 18th.) > Attached (used xfs_metadump -ao) Soon we will all be gods with powers to replay history ;) > ((Eric: If this doesn't turn out to be a totally garbage patch, feel > free to add it to xfsprogs.)) > > --D >
Attachment:
metadump.xfs.testfile5.mark0.bz2
Description: BZip2 compressed data