Re: XFS bug discovered by crash tests?

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Mon, 28 Aug 2017 10:52:29 -0700

On Mon, Aug 28, 2017 at 11:36:38AM +0300, Amir Goldstein wrote:
> On Mon, Aug 28, 2017 at 11:33 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > Christoph, Darrick,
> >
> > As I reported last week, I started running Josef's log-writes crash
> > tests and immediately got reports on data checksum errors when
> > running the tests on xfs.
> >
> > Unlike ext4 and btrfs, xfs tests seemed to fail arbitrarily for any
> > value of random seed I tried. Unlike xfs, I never observed data
> > checksum errors on ext4 and btrfs (only fsck errors).
> >
> > It's quite easy to reproduce the reported checksum errors when
> > running the test currently on my xfstests branch:
> > https://github.com/amir73il/xfstests/commits/dm-log-writes
> >
> > Looking closer at the reported checksum errors, in all cases
> > I examined, the problem was, that after a sequence of
> > PUNCH_HOLE+FSYNC on a test file, a partially zeroed block,
> > both at beginning and end of zero range is not zeroed after
> > crash.
> >
> > For example, the following file does not have zeroes after crash
> > at end of logical block #11:
> 
> Block #10 that is... #11 is unmapped.
> 
> >
> > ---------------------
> > Filesystem type is: 58465342
> > File size of /mnt/scratch/testfile2 is 248338 (61 blocks of 4096 bytes)
> >  ext:     logical_offset:        physical_offset: length:   expected: flags:
> >    0:        1..       3:         33..        35:      3:             unwritten
> >    1:       10..      10:         93..        93:      1:         36:
> >    2:       20..      23:        147..       150:      4:         94: unwritten
> >    3:       24..      31:        158..       165:      8:        151: unwritten
> >    4:       34..      34:        146..       146:      1:        166: unwritten
> >    5:       35..      38:        151..       154:      4:        147: unwritten
> >    6:       41..      44:        167..       170:      4:        155: unwritten
> >    7:       46..      46:        166..       166:      1:        171:
> >    8:       47..      50:         89..        92:      4:        167:
> >    9:       51..      60:        171..       180:     10:         93: last,eof
> > /mnt/scratch/testfile2: 10 extents found
> >
> > /mnt/scratch/testfile2 (bad):
> > 0000000 0000 0000 0000 0000 0000 0000 0000 0000
> > *
> > 000a510 5858 5858 5858 5858 5858 5858 5858 5858
> > *
> > 000b000 0000 0000 0000 0000 0000 0000 0000 0000
> > *
> > 002e280 0000 0000 0000 0000 0000 0000 5858 5858
> > 002e290 5858 5858 5858 5858 5858 5858 5858 5858
> > *
> > 0038720 5858 5858 0000 0000 0000 0000 0000 0000
> > 0038730 0000 0000 0000 0000 0000 0000 0000 0000
> > *
> > 003ca12
> > ------------------
> >
> > However, this crash checkpoint (testfile2.mark1) was taken
> > after punch+fsync that should have zeroed the end of block #11
> > (0xa988..000b000):
> > -------------------
> > ...
> > 2: 16 punch     from 0xa988 to 0xf126, (0x479e bytes)
> > 2: 17 read      0x12420 thru    0x1aa08 (0x85e9 bytes)
> > 2: 18 write     0x30d11 thru    0x3a723 (0x9a13 bytes)
> > 2: 19 punch     from 0x27988 to 0x2aaed, (0x3165 bytes)
> > 2: 20 write     0x2d6ff thru    0x369f3 (0x92f5 bytes)
> > 2: 21 zero      from 0x22882 to 0x22e14, (0x592 bytes)
> > 2: 22 zero      from 0x14655 to 0x1e636, (0x9fe1 bytes)
> > 2: 23 zero      from 0x17c91 to 0x1fb75, (0x7ee4 bytes)
> > 2: 24 punch     from 0x273eb to 0x3028c, (0x8ea1 bytes)
> > 2: 25 zero      from 0x29eb2 to 0x2c692, (0x27e0 bytes)
> > 2: 26 zero      from 0x11ac to 0x3910, (0x2764 bytes)
> > 2: truncating to largest ever: 0x3ea12
> > 2: 27 trunc     from 0x3a724 to 0x3ea12
> > 2: 28 collapse  from 0x2d000 to 0x2f000, (0x2000 bytes)
> > 2: 29 falloc    from 0x22cf2 to 0x2733b (0x4649 bytes)
> > 2: 30 mapread   0x3466a thru    0x3ca11 (0x83a8 bytes)
> > 2: 31 fsync
> > 2: Dumped fsync buffer to testfile2.mark1
> >
> > /mnt/test/fsxtests/testfile2.mark1 (good):
> > 0000000 0000 0000 0000 0000 0000 0000 0000 0000
> > *
> > 000a510 5858 5858 5858 5858 5858 5858 5858 5858
> > *
> > 000a980 5858 5858 5858 5858 0000 0000 0000 0000
> > 000a990 0000 0000 0000 0000 0000 0000 0000 0000
> > *
> > 002e280 0000 0000 0000 0000 0000 0000 5858 5858
> > 002e290 5858 5858 5858 5858 5858 5858 5858 5858
> > *
> > 0038720 5858 5858 0000 0000 0000 0000 0000 0000
> > 0038730 0000 0000 0000 0000 0000 0000 0000 0000
> > *
> > 003ca12
> >
> > --------------------------
> >
> > Anyway, I went to look at xfs_zero_range() and while I admit
> > it was hard for me to follow down all the actors into block
> > layer, I couldn't find where partial zeroed page is marked dirty.
> >
> > Can you please have a look and say what you make of this?

<shrug> /me would have thought the page gets dirty via:

xfs_zero_range -> iomap_zero_range -> iomap_zero_range_actor ->
iomap_zero -> iomap_write_end -> generic_write_end -> block_write_end ->
__block_commit_write -> mark_buffer_dirty -> __set_page_dirty

But maybe that doesn't happen?

--D

> >
> > Thanks,
> > Amir.
> >
> > P.S. if needed I can provide the recorded writes log to replay the
> > I/O sequence that results in the reported error (it's 13K compressed)
> > but the problem seems obvious and easy to reproduce using the
> > xfstest (reproduced at high probability not always).
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html