Hi guys, I am getting these errors often when running the crash tests with cloned files (generic/502 in my xfstests patches). Hitting these errors requires first fixing 2 other issues that shadow over this issue: "xfs: fix incorrect log_flushed on fsync" (in master) "xfs: fix leftover CoW extent after truncate" available on my tree based on Darrick's simple fix: https://github.com/amir73il/linux/commits/xfs-fsync I get the errors more often (1 out of 5) on a 100G fs on spinning disk. On a 10G fs on SSD they are less frequent. The log in this email was captured on patched stable 4.9.47 kernel, but I am getting the same errors on patched upstream kernel. I wasn't able to create a deterministic reproducer, so attaching the full log from a failed test along with an IO log that can be replayed on your disk to examine the outcome. Following is the output of fsx process #5, which is the process that wrote the problematic testfile5.mark0 to the log. This process performs only read,zero,fsync before creating the log mark. The file testfile5 was cloned from an origin 256K file before running fsx. Later, I used the random seed 35484 in this log for all processes and it seemed to increase the probability for failure. # /old/home/amir/src/xfstests-dev/ltp/fsx -N 100 -d -k -P /mnt/test/fsxtests -i /dev/mapper/logwrites-test -S 0 -j 5 /mnt/scratch/testfile5 Seed set to 35484 file_size=262144 5: 1 read 0x3f959 thru 0x3ffff (0x6a7 bytes) 5: 2 zero from 0x3307e to 0x34f74, (0x1ef6 bytes) 5: 3 fsync 5: Dumped fsync buffer to testfile5.mark0 In order to get to the crash state you need to get my xfstests replay-log patches and replay the attached log on a >= 100G scratch device: # ./src/log-writes/replay-log --log log.xfs.testfile5.mark0 --replay $SCRATCH_DEV --end-mark testfile5.mark0 # mount $SCRATCH_DEV $SCRATCH_MNT # umount $SCRATCH_MNT # xfs_repair -n $SCRATCH_DEV Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 fatal error -- illegal state 13 in block map 376 Can anyone provide some insight? Thanks, Amir.
Attachment:
502.full.xfs.testfile5.mark0
Description: Binary data
Attachment:
log.xfs.testfile5.mark0.bz2
Description: BZip2 compressed data