Re: [PATCH] f2fs/008: avoid endless wait

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 12 Mar 2025 07:48:51 +1100

On Tue, Mar 11, 2025 at 04:03:47PM +1100, David Disseldorp wrote:
> "udevadm wait <dev>" without a --timeout=SECONDS parameter will wait
> endlessly. Endless wait can be triggered in f2fs/008 by e.g. using a
> zram device as a SCRATCH_DEV, where "device type is unknown" failure
> sees the /dev/mapper/$vgname-$lvname node never appear.
> 
> Signed-off-by: David Disseldorp <ddiss@xxxxxxx>
> ---
> It might make more sense to add a default timeout to _udev_wait(), but
> that could also call udev{adm }settle.

One of the points of moving to _udev_wait was to explicitly wait for
the specific device to appear or disappear, indicating that the
previous admin operations have completed. 'udevadm settle' does not
do that - it waits for the global queue of events to drain and
dev config failure doesn't generate udev events.

Hence tests that silently fail dm/lvm device setup will continue to
run on something they shouldn't have, and nobody will realise that
the LVM setup did not run correctly.

OTOH, _udev_wait() will hang in that situation because it's waiting
for the device node to appear (or disappear) so it can be
immediately used. We don't need udev to finish draining queues - we
only need to wait for the device node to appear and the test is good
to go.

This behaviour provides obvious failures when device setup/teardown
fails in some way. This should not happen, and when it does
_udev_wait hanging forces that failure to be triaged and fixed
immediately....

If we add a timeout to _udev_wait(), then we're back to the old
behaviour where device config failures get ignored and the test runs
incorrectly. Except now it is worse because we have to wait a
timeout before the test is then run....

> 
>  tests/f2fs/008 | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/f2fs/008 b/tests/f2fs/008
> index 47696f2b..fdb0b0c2 100755
> --- a/tests/f2fs/008
> +++ b/tests/f2fs/008
> @@ -35,9 +35,12 @@ _cleanup()
>  	rm -f $tmp.*
>  }
>  
> -$LVM_PROG pvcreate -f $SCRATCH_DEV >>$seqres.full 2>&1
> -$LVM_PROG vgcreate -f $vgname $SCRATCH_DEV >>$seqres.full 2>&1
> -$LVM_PROG lvcreate -y -L 1024m -n $lvname $vgname >>$seqres.full 2>&1
> +$LVM_PROG pvcreate -f $SCRATCH_DEV >>$seqres.full 2>&1 \
> +	|| _notrun "unabled to create LVM physical volume"
> +$LVM_PROG vgcreate -f $vgname $SCRATCH_DEV >>$seqres.full 2>&1 \
> +	|| _notrun "unabled to create LVM volume group"
> +$LVM_PROG lvcreate -y -L 1024m -n $lvname $vgname >>$seqres.full 2>&1 \
> +	|| _notrun "unabled to create LVM logical volume"

These should be _fail calls, IMO.

AFAICT, there is no reason for LVM config to fail on a valid block
device.  If there is a block device that doesn't support LVM, then
we've got lots of tests that are going to fail the same way and we
should have a _require_scratch_lvm() to notrun those tests for that
type of block device (or fix the block device...).

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx