On Mon, Jul 02, 2018 at 11:27:11AM +0200, Carlos Maiolino wrote: > On Sat, Jun 30, 2018 at 12:57:38AM +0800, Zorro Lang wrote: > > If a user constructs a test that loops repeatedly over below steps > > on dm-thin, block allocation can fail due to discards not having > > completed yet (Fixed by a685557 dm thin: handle running out of data > > space vs concurrent discard): > > 1) fill thin device via filesystem file > > 2) remove file > > 3) fstrim > > > > And this maybe cause a deadlock (fast device likes ramdisk can help > > a lot) when racing a fstrim with a filesystem (XFS) shutdown. (Fixed > > by 8c81dd46ef3c Force log to disk before reading the AGF during a > > fstrim) > > > > This case can reproduce both two bugs if they're not fixed. If only > > the dm-thin bug is fixed, then the test will pass. If only the fs > > bug is fixed, then the test will fail. If both of bugs aren't fixed, > > the test will hang. > > > > Signed-off-by: Zorro Lang <zlang@xxxxxxxxxx> > > --- > > > > Hi, > > > > If both of two bugs aren't fixed, a loop device base on tmpfs can help > > reproduce the XFS deadlock: > > 1) mount -t tmpfs tmpfs /tmp > > 2) dd if=/dev/zero of=/tmp/test.img bs=1M count=100 > > 3) losetup /dev/loop0 /tmp/test.img > > 4) use /dev/loop0 to be SCRATCH_DEV, run this case. The test will hang there. > > Particularly, I could never reproduce this bug on spindles or SSDs, and I > believe many (if not most) people run xfstests on commodity hardware, not on > very fast disks, and the test doesn't reproduce the bug 100% of the times when > running on slow disks, so, unless the default for the test is to run it using > ramdisks, the test is useless IMHO. As a racing test, I think there's not 100% reproducible case. This case already can cover this issue in some conditions. > > > > > Ramdisk can help trigger the race. Maybe NVME device can help too. But it's > > hard to reproduce on general disk. > > > > I didn't test it on NVME, so I can't tell =/ I didn't try NVME and SSD. From my test, if the underlying SCRATCH_DEV support fstrim, the case can reproduce this bug. For example: If I create a device by: # modprobe scsi_debug dev_size_mb=100 Then I can't reproduce this bug. If I create a device by # modprobe scsi_debug lbpu=1 lbpws=1 dev_size_mb=100 Then the bug is reproducible: # ./check generic/499 FSTYP -- xfs (non-debug) PLATFORM -- Linux/x86_64 xxxx 3.10.0-915.el7.x86_64 MKFS_OPTIONS -- -f -bsize=4096 /dev/sde MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sde /mnt/scratch generic/499 2s ... [failed, exit status 1]- output mismatch (see /root/git/xfstests-zlang/results//generic/499.out.bad) --- tests/generic/499.out 2018-06-29 10:38:58.965827495 -0400 +++ /root/git/xfstests-zlang/results//generic/499.out.bad 2018-07-02 06:20:34.841313041 -0400 @@ -1,2 +1,106 @@ QA output created by 499 -Silence is golden +fstrim: /mnt/scratch: FITRIM ioctl failed: Input/output error +fstrim: cannot open /mnt/scratch: Input/output error +fstrim: cannot open /mnt/scratch: Input/output error +fstrim: cannot open /mnt/scratch: Input/output error +fstrim: cannot open /mnt/scratch: Input/output error ... (Run 'diff -u tests/generic/499.out /root/git/xfstests-zlang/results//generic/499.out.bad' to see the entire diff) Ran: generic/499 Failures: generic/499 Failed 1 of 1 tests Thanks, Zorro > > > If the XFS bug is fixed, above steps can reproduce dm-thin bug, the test > > will fail. > > > > Unfortunately, if the dm-thin bug is fixed, then this case can't reproduce > > the XFS bug singly. > > > > Thanks, > > Zorro > > > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > +# Copyright (c) 2018 Red Hat Inc. All Rights Reserved. > > +# > > +# FS QA Test 499 > > +# > > +# Race test running out of data space with concurrent discard operation on > > +# dm-thin. > > +# > > +# If a user constructs a test that loops repeatedly over below steps on > > +# dm-thin, block allocation can fail due to discards not having completed > > +# yet (Fixed by a685557 dm thin: handle running out of data space vs > > +# concurrent discard): > > +# 1) fill thin device via filesystem file > > +# 2) remove file > > +# 3) fstrim > > +# > > +# And this maybe cause a deadlock when racing a fstrim with a filesystem > > +# (XFS) shutdown. (Fixed by 8c81dd46ef3c Force log to disk before reading > > +# the AGF during a fstrim) > > + > > > > +# There're two bugs at here, one is dm-thin bug, the other is filesystem > > +# (XFS especially) bug. The dm-thin bug can't handle running out of data > > +# space with concurrent discard well. Then the dm-thin bug cause fs unmount > > +# hang when racing a fstrim with a filesystem shutdown. > > +# > > +# If both of two bugs haven't been fixed, below test maybe cause deadlock. > > +# Else if the fs bug has been fixed, but the dm-thin bug hasn't. below test > > +# will cause the test fail (no deadlock). > > +# Else the test will pass. > > The test looks mostly ok, despite the fact I believe this should run on a > ramdisk by default (or not run, if $SCRATCH_DEV is not a ramdisk) > > -- > Carlos > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html