On Mon, Aug 28, 2017 at 11:33 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > Christoph, Darrick, > > As I reported last week, I started running Josef's log-writes crash > tests and immediately got reports on data checksum errors when > running the tests on xfs. > > Unlike ext4 and btrfs, xfs tests seemed to fail arbitrarily for any > value of random seed I tried. Unlike xfs, I never observed data > checksum errors on ext4 and btrfs (only fsck errors). > > It's quite easy to reproduce the reported checksum errors when > running the test currently on my xfstests branch: > https://github.com/amir73il/xfstests/commits/dm-log-writes > > Looking closer at the reported checksum errors, in all cases > I examined, the problem was, that after a sequence of > PUNCH_HOLE+FSYNC on a test file, a partially zeroed block, > both at beginning and end of zero range is not zeroed after > crash. > > For example, the following file does not have zeroes after crash > at end of logical block #11: Block #10 that is... #11 is unmapped. > > --------------------- > Filesystem type is: 58465342 > File size of /mnt/scratch/testfile2 is 248338 (61 blocks of 4096 bytes) > ext: logical_offset: physical_offset: length: expected: flags: > 0: 1.. 3: 33.. 35: 3: unwritten > 1: 10.. 10: 93.. 93: 1: 36: > 2: 20.. 23: 147.. 150: 4: 94: unwritten > 3: 24.. 31: 158.. 165: 8: 151: unwritten > 4: 34.. 34: 146.. 146: 1: 166: unwritten > 5: 35.. 38: 151.. 154: 4: 147: unwritten > 6: 41.. 44: 167.. 170: 4: 155: unwritten > 7: 46.. 46: 166.. 166: 1: 171: > 8: 47.. 50: 89.. 92: 4: 167: > 9: 51.. 60: 171.. 180: 10: 93: last,eof > /mnt/scratch/testfile2: 10 extents found > > /mnt/scratch/testfile2 (bad): > 0000000 0000 0000 0000 0000 0000 0000 0000 0000 > * > 000a510 5858 5858 5858 5858 5858 5858 5858 5858 > * > 000b000 0000 0000 0000 0000 0000 0000 0000 0000 > * > 002e280 0000 0000 0000 0000 0000 0000 5858 5858 > 002e290 5858 5858 5858 5858 5858 5858 5858 5858 > * > 0038720 5858 5858 0000 0000 0000 0000 0000 0000 > 0038730 0000 0000 0000 0000 0000 0000 0000 0000 > * > 003ca12 > ------------------ > > However, this crash checkpoint (testfile2.mark1) was taken > after punch+fsync that should have zeroed the end of block #11 > (0xa988..000b000): > ------------------- > ... > 2: 16 punch from 0xa988 to 0xf126, (0x479e bytes) > 2: 17 read 0x12420 thru 0x1aa08 (0x85e9 bytes) > 2: 18 write 0x30d11 thru 0x3a723 (0x9a13 bytes) > 2: 19 punch from 0x27988 to 0x2aaed, (0x3165 bytes) > 2: 20 write 0x2d6ff thru 0x369f3 (0x92f5 bytes) > 2: 21 zero from 0x22882 to 0x22e14, (0x592 bytes) > 2: 22 zero from 0x14655 to 0x1e636, (0x9fe1 bytes) > 2: 23 zero from 0x17c91 to 0x1fb75, (0x7ee4 bytes) > 2: 24 punch from 0x273eb to 0x3028c, (0x8ea1 bytes) > 2: 25 zero from 0x29eb2 to 0x2c692, (0x27e0 bytes) > 2: 26 zero from 0x11ac to 0x3910, (0x2764 bytes) > 2: truncating to largest ever: 0x3ea12 > 2: 27 trunc from 0x3a724 to 0x3ea12 > 2: 28 collapse from 0x2d000 to 0x2f000, (0x2000 bytes) > 2: 29 falloc from 0x22cf2 to 0x2733b (0x4649 bytes) > 2: 30 mapread 0x3466a thru 0x3ca11 (0x83a8 bytes) > 2: 31 fsync > 2: Dumped fsync buffer to testfile2.mark1 > > /mnt/test/fsxtests/testfile2.mark1 (good): > 0000000 0000 0000 0000 0000 0000 0000 0000 0000 > * > 000a510 5858 5858 5858 5858 5858 5858 5858 5858 > * > 000a980 5858 5858 5858 5858 0000 0000 0000 0000 > 000a990 0000 0000 0000 0000 0000 0000 0000 0000 > * > 002e280 0000 0000 0000 0000 0000 0000 5858 5858 > 002e290 5858 5858 5858 5858 5858 5858 5858 5858 > * > 0038720 5858 5858 0000 0000 0000 0000 0000 0000 > 0038730 0000 0000 0000 0000 0000 0000 0000 0000 > * > 003ca12 > > -------------------------- > > Anyway, I went to look at xfs_zero_range() and while I admit > it was hard for me to follow down all the actors into block > layer, I couldn't find where partial zeroed page is marked dirty. > > Can you please have a look and say what you make of this? > > Thanks, > Amir. > > P.S. if needed I can provide the recorded writes log to replay the > I/O sequence that results in the reported error (it's 13K compressed) > but the problem seems obvious and easy to reproduce using the > xfstest (reproduced at high probability not always).