On Sat, Jun 30, 2018 at 12:57:38AM +0800, Zorro Lang wrote: > If a user constructs a test that loops repeatedly over below steps > on dm-thin, block allocation can fail due to discards not having > completed yet (Fixed by a685557 dm thin: handle running out of data > space vs concurrent discard): > 1) fill thin device via filesystem file > 2) remove file > 3) fstrim > > And this maybe cause a deadlock (fast device likes ramdisk can help > a lot) when racing a fstrim with a filesystem (XFS) shutdown. (Fixed > by 8c81dd46ef3c Force log to disk before reading the AGF during a > fstrim) > > This case can reproduce both two bugs if they're not fixed. If only > the dm-thin bug is fixed, then the test will pass. If only the fs > bug is fixed, then the test will fail. If both of bugs aren't fixed, > the test will hang. > > Signed-off-by: Zorro Lang <zlang@xxxxxxxxxx> > --- > > Hi, > > If both of two bugs aren't fixed, a loop device base on tmpfs can help > reproduce the XFS deadlock: > 1) mount -t tmpfs tmpfs /tmp > 2) dd if=/dev/zero of=/tmp/test.img bs=1M count=100 > 3) losetup /dev/loop0 /tmp/test.img > 4) use /dev/loop0 to be SCRATCH_DEV, run this case. The test will hang there. Particularly, I could never reproduce this bug on spindles or SSDs, and I believe many (if not most) people run xfstests on commodity hardware, not on very fast disks, and the test doesn't reproduce the bug 100% of the times when running on slow disks, so, unless the default for the test is to run it using ramdisks, the test is useless IMHO. > > Ramdisk can help trigger the race. Maybe NVME device can help too. But it's > hard to reproduce on general disk. > I didn't test it on NVME, so I can't tell =/ > If the XFS bug is fixed, above steps can reproduce dm-thin bug, the test > will fail. > > Unfortunately, if the dm-thin bug is fixed, then this case can't reproduce > the XFS bug singly. > > Thanks, > Zorro > > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2018 Red Hat Inc. All Rights Reserved. > +# > +# FS QA Test 499 > +# > +# Race test running out of data space with concurrent discard operation on > +# dm-thin. > +# > +# If a user constructs a test that loops repeatedly over below steps on > +# dm-thin, block allocation can fail due to discards not having completed > +# yet (Fixed by a685557 dm thin: handle running out of data space vs > +# concurrent discard): > +# 1) fill thin device via filesystem file > +# 2) remove file > +# 3) fstrim > +# > +# And this maybe cause a deadlock when racing a fstrim with a filesystem > +# (XFS) shutdown. (Fixed by 8c81dd46ef3c Force log to disk before reading > +# the AGF during a fstrim) > + > +# There're two bugs at here, one is dm-thin bug, the other is filesystem > +# (XFS especially) bug. The dm-thin bug can't handle running out of data > +# space with concurrent discard well. Then the dm-thin bug cause fs unmount > +# hang when racing a fstrim with a filesystem shutdown. > +# > +# If both of two bugs haven't been fixed, below test maybe cause deadlock. > +# Else if the fs bug has been fixed, but the dm-thin bug hasn't. below test > +# will cause the test fail (no deadlock). > +# Else the test will pass. The test looks mostly ok, despite the fact I believe this should run on a ramdisk by default (or not run, if $SCRATCH_DEV is not a ramdisk) -- Carlos -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html