Re: [PATCH 2/2] fstests: btrfs/011: Handle finished scrub/replace operation gracefully

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Wed, Sep 18, 2019 at 02:56:26PM +0800, Qu Wenruo wrote:
> [BUG]
> When btrfs/011 is executed on a fast enough system (fully memory backed
> VM, with test device has unsafe cache mode), the test can fail like
> this:
> 
>   btrfs/011 43s ... [failed, exit status 1]- output mismatch (see /home/adam/xfstests-dev/results//btrfs/011.out.bad)
>     --- tests/btrfs/011.out     2019-07-22 14:13:44.643333326 +0800
>     +++ /home/adam/xfstests-dev/results//btrfs/011.out.bad      2019-09-18 14:49:28.308798022 +0800
>     @@ -1,3 +1,4 @@
>      QA output created by 011
>      *** test btrfs replace
>     -*** done
>     +failed: '/usr/bin/btrfs replace cancel /mnt/scratch'
>     +(see /home/adam/xfstests-dev/results//btrfs/011.full for details)
>     ...
> 
> [CAUSE]
> Looking into the full output, it shows:
>   ...
>   Replace from /dev/mapper/test-scratch1 to /dev/mapper/test-scratch2
> 
>   # /usr/bin/btrfs replace start -f /dev/mapper/test-scratch1 /dev/mapper/test-scratch2 /mnt/scratch
>   # /usr/bin/btrfs replace cancel /mnt/scratch
>   INFO: ioctl(DEV_REPLACE_CANCEL)"/mnt/scratch": not started
>   failed: '/usr/bin/btrfs replace cancel /mnt/scratch'
> 
> So this means the replace is already finished before we cancel it.
> For fast system, it's very common.

Does generate heavier load & more data make replace operation last
longer? e.g. make more 'noise' by running fsstress instead of dumping
/dev/urandom before starting replace.

And does sleep shorter time (0.5s?) before cancel work?

Thanks,
Eryu

> 
> [FIX]
> Instead of using _run_btrfs_util_prog which requires 0 as return value,
> we just call "$BTRFS_UTIL_PROG replace cancel" and ignore all its
> stderr/stdout, and completely rely on "$BTRFS_UTIL_PROG replace status"
> output to verify the work.
> 
> Furthermore if we finished replac before cancelling it, we should
> replace again to switch the device back, or after the test case, btrfs
> check will fail as there is no valid btrfs on that replaced device.
> 
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> ---
>  tests/btrfs/011 | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/btrfs/011 b/tests/btrfs/011
> index 89bb4d11..858b00e8 100755
> --- a/tests/btrfs/011
> +++ b/tests/btrfs/011
> @@ -148,13 +148,25 @@ btrfs_replace_test()
>  		# background the replace operation (no '-B' option given)
>  		_run_btrfs_util_prog replace start -f $replace_options $source_dev $target_dev $SCRATCH_MNT
>  		sleep 1
> -		_run_btrfs_util_prog replace cancel $SCRATCH_MNT
> +		# 1s is enough for fast system to finish replace, so here we
> +		# ignore all the output, completely rely on later status
> +		# output to determine
> +		$BTRFS_UTIL_PROG replace cancel $SCRATCH_MNT &> /dev/null
>  
>  		# 'replace status' waits for the replace operation to finish
>  		# before the status is printed
>  		$BTRFS_UTIL_PROG replace status $SCRATCH_MNT > $tmp.tmp 2>&1
>  		cat $tmp.tmp >> $seqres.full
> -		grep -q canceled $tmp.tmp || _fail "btrfs replace status (canceled) failed"
> +		grep -q -e canceled -e finished $tmp.tmp ||\
> +			_fail "btrfs replace status (canceled) failed"
> +
> +		# If replace finished before cancel, replace them back or
> +		# the final fsck after test case will fail as there is no btrfs
> +		# on the $source_dev anymore
> +		if grep -q -e finished $tmp.tmp ; then
> +			$BTRFS_UTIL_PROG replace start -Bf $replace_options \
> +				$target_dev $source_dev $SCRATCH_MNT
> +		fi
>  	else
>  		if [ "${quick}Q" = "thoroughQ" ]; then
>  			# On current hardware, the thorough test runs
> -- 
> 2.22.0



[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux