Re: [PATCH] fstests: UDEV_SETTLE_PROG before dmsetup create

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Wed, Dec 01, 2021 at 01:18:52PM -0500, Josef Bacik wrote:
> We've been seeing transient errors with any test that uses a dm device
> for the entirety of the time that we've been running nightly xfstests

I have been having it on my tests vms since ever as well.
It's really annoying, but fortunatelly it doesn't happen too often.

> runs.  This turns out to be because sometimes we get EBUSY while trying
> to create our new dm device.  Generally this is because the test comes
> right after another test that messes with the dm device, and thus we
> still have udev messing around with the device when DM tries to O_EXCL
> the block device.
> 
> Add a UDEV_SETTLE_PROG before creating the device to make sure we can
> create our new dm device without getting this transient error.

I suspect this might only make it seem the problem goes away but does not
really fix it.

I say that for 2 reasons:

1) All tests that use dm end up calling _dmsetup_remove(), like through
   _log_writes_remove() or _cleanup_flakey() for example. Normally those
   are called in the _cleanup() function, which ensures it's done even if
   the test fails for some reason.

   So I don't understand why we need that UDEV_SETTLE_PROG at _dmsetup_create().

   And I've seen the ebusy failure happen even when the previous tests did
   not use any dm device;

2) Some tests fail after creating the dm device and using it. For example
   btrfs/206 often fails when it tries to fsck the filesystem:

   btrfs/206 3s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/206.out.bad)
        --- tests/btrfs/206.out     2020-10-16 23:13:46.554152652 +0100
        +++ /home/fdmanana/git/hub/xfstests/results//btrfs/206.out.bad      2021-12-01 21:09:46.317632589 +0000
        @@ -3,3 +3,5 @@
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        wrote 8192/8192 bytes at offset 0
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        +_check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
        +(see /home/fdmanana/git/hub/xfstests/results//btrfs/206.full for details)
        ...

       (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/206.out /home/fdmanana/git/hub/xfstests/results//btrfs/206.out.bad'  to see the entire diff)

    In the .full file I got:

    (...)
    replaying 1239@11201: sector 2173408, size 16384, flags 0x10(METADATA)
    replaying 1240@11234: sector 0, size 0, flags 0x1(FLUSH)
    replaying 1241@11235: sector 128, size 4096, flags 0x12(FUA|METADATA)
    _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
    *** fsck.btrfs output ***
    ERROR: cannot open device '/dev/sdc': Device or resource busy
    ERROR: cannot open file system
    Opening filesystem to check...
    *** end fsck.btrfs output
    *** mount output ***

   The ebusy failure is not when the test starts, but when somewhere in the middle
   of the replay loop when it calls fsck, or when it ends and the fstests framework
   calls fsck.

   I've seen that with btrfs/172 too, which also uses dm logwrites in a similar way.

So to me this suggests 2 things:

1) Calling UDEV_SETTLE_PROG at _dmsetup_create() doesn't solve that problem with
   btrfs/206 (and other tests) - the problem is fsck failing to open the scratch
   device after it called _log_writes_remove() -> _dmsetup_remove(), and not a
   failure to create the dm device;

2) The problem is likely something missing at _dmsetup_remove(). Perhaps add
   another UDEV_SETTLE_PROG there:

   diff --git a/common/rc b/common/rc
   index 8e351f17..22b34677 100644
   --- a/common/rc
   +++ b/common/rc
   @@ -4563,6 +4563,7 @@ _dmsetup_remove()
            $UDEV_SETTLE_PROG >/dev/null 2>&1
            $DMSETUP_PROG remove "$@" >>$seqres.full 2>&1
            $DMSETUP_PROG mknodes >/dev/null 2>&1
    +       $UDEV_SETTLE_PROG >/dev/null 2>&1
     }
 
    _dmsetup_create()

  I can't say if that change to _dmsetup_remove() is correct, or what it's
  needed, as I really haven't spent time trying to figure out why the issue
  happens.

    


> 
> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
> ---
>  common/rc | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/common/rc b/common/rc
> index 8e351f17..35e861ec 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -4567,6 +4567,7 @@ _dmsetup_remove()
>  
>  _dmsetup_create()
>  {
> +	$UDEV_SETTLE_PROG >/dev/null 2>&1
>  	$DMSETUP_PROG create "$@" >>$seqres.full 2>&1 || return 1
>  	$DMSETUP_PROG mknodes >/dev/null 2>&1
>  	$UDEV_SETTLE_PROG >/dev/null 2>&1
> -- 
> 2.26.3
> 



[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux