On Wed, Dec 01, 2021 at 01:18:52PM -0500, Josef Bacik wrote: > We've been seeing transient errors with any test that uses a dm device > for the entirety of the time that we've been running nightly xfstests I have been having it on my tests vms since ever as well. It's really annoying, but fortunatelly it doesn't happen too often. > runs. This turns out to be because sometimes we get EBUSY while trying > to create our new dm device. Generally this is because the test comes > right after another test that messes with the dm device, and thus we > still have udev messing around with the device when DM tries to O_EXCL > the block device. > > Add a UDEV_SETTLE_PROG before creating the device to make sure we can > create our new dm device without getting this transient error. I suspect this might only make it seem the problem goes away but does not really fix it. I say that for 2 reasons: 1) All tests that use dm end up calling _dmsetup_remove(), like through _log_writes_remove() or _cleanup_flakey() for example. Normally those are called in the _cleanup() function, which ensures it's done even if the test fails for some reason. So I don't understand why we need that UDEV_SETTLE_PROG at _dmsetup_create(). And I've seen the ebusy failure happen even when the previous tests did not use any dm device; 2) Some tests fail after creating the dm device and using it. For example btrfs/206 often fails when it tries to fsck the filesystem: btrfs/206 3s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/206.out.bad) --- tests/btrfs/206.out 2020-10-16 23:13:46.554152652 +0100 +++ /home/fdmanana/git/hub/xfstests/results//btrfs/206.out.bad 2021-12-01 21:09:46.317632589 +0000 @@ -3,3 +3,5 @@ XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) wrote 8192/8192 bytes at offset 0 XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +_check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent +(see /home/fdmanana/git/hub/xfstests/results//btrfs/206.full for details) ... (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/206.out /home/fdmanana/git/hub/xfstests/results//btrfs/206.out.bad' to see the entire diff) In the .full file I got: (...) replaying 1239@11201: sector 2173408, size 16384, flags 0x10(METADATA) replaying 1240@11234: sector 0, size 0, flags 0x1(FLUSH) replaying 1241@11235: sector 128, size 4096, flags 0x12(FUA|METADATA) _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent *** fsck.btrfs output *** ERROR: cannot open device '/dev/sdc': Device or resource busy ERROR: cannot open file system Opening filesystem to check... *** end fsck.btrfs output *** mount output *** The ebusy failure is not when the test starts, but when somewhere in the middle of the replay loop when it calls fsck, or when it ends and the fstests framework calls fsck. I've seen that with btrfs/172 too, which also uses dm logwrites in a similar way. So to me this suggests 2 things: 1) Calling UDEV_SETTLE_PROG at _dmsetup_create() doesn't solve that problem with btrfs/206 (and other tests) - the problem is fsck failing to open the scratch device after it called _log_writes_remove() -> _dmsetup_remove(), and not a failure to create the dm device; 2) The problem is likely something missing at _dmsetup_remove(). Perhaps add another UDEV_SETTLE_PROG there: diff --git a/common/rc b/common/rc index 8e351f17..22b34677 100644 --- a/common/rc +++ b/common/rc @@ -4563,6 +4563,7 @@ _dmsetup_remove() $UDEV_SETTLE_PROG >/dev/null 2>&1 $DMSETUP_PROG remove "$@" >>$seqres.full 2>&1 $DMSETUP_PROG mknodes >/dev/null 2>&1 + $UDEV_SETTLE_PROG >/dev/null 2>&1 } _dmsetup_create() I can't say if that change to _dmsetup_remove() is correct, or what it's needed, as I really haven't spent time trying to figure out why the issue happens. > > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > --- > common/rc | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/common/rc b/common/rc > index 8e351f17..35e861ec 100644 > --- a/common/rc > +++ b/common/rc > @@ -4567,6 +4567,7 @@ _dmsetup_remove() > > _dmsetup_create() > { > + $UDEV_SETTLE_PROG >/dev/null 2>&1 > $DMSETUP_PROG create "$@" >>$seqres.full 2>&1 || return 1 > $DMSETUP_PROG mknodes >/dev/null 2>&1 > $UDEV_SETTLE_PROG >/dev/null 2>&1 > -- > 2.26.3 >