On Fri, Sep 27, 2019 at 10:59 AM Qu Wenruo <wqu@xxxxxxxx> wrote: > > [BUG] > When btrfs/011 is executed on a fast enough system (fully memory backed > VM, with test device has unsafe cache mode), the test can fail like > this: > > btrfs/011 43s ... [failed, exit status 1]- output mismatch (see /home/adam/xfstests-dev/results//btrfs/011.out.bad) > --- tests/btrfs/011.out 2019-07-22 14:13:44.643333326 +0800 > +++ /home/adam/xfstests-dev/results//btrfs/011.out.bad 2019-09-18 14:49:28.308798022 +0800 > @@ -1,3 +1,4 @@ > QA output created by 011 > *** test btrfs replace > -*** done > +failed: '/usr/bin/btrfs replace cancel /mnt/scratch' > +(see /home/adam/xfstests-dev/results//btrfs/011.full for details) > ... > > [CAUSE] > Looking into the full output, it shows: > ... > Replace from /dev/mapper/test-scratch1 to /dev/mapper/test-scratch2 > > # /usr/bin/btrfs replace start -f /dev/mapper/test-scratch1 /dev/mapper/test-scratch2 /mnt/scratch > # /usr/bin/btrfs replace cancel /mnt/scratch > INFO: ioctl(DEV_REPLACE_CANCEL)"/mnt/scratch": not started > failed: '/usr/bin/btrfs replace cancel /mnt/scratch' > > So this means the replace is already finished before we cancel it. > For fast system, it's very common. > > [FIX] > In fill_scratch() after all the original file creations, do a timer > based direct IO write. > The extra write will take 2 * $wait_time, utilizing direct IO with 64K > block size, the write performance should be very comparable (although a > little faster) to replace performance. > > So later cancel should be able to really cancel the dev-replace without > it finished too early. > > Also, do extra check about the above write. If we hit ENOSPC we just > skip the test as the system is really too fast and the fs is not large > enough. > > Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx> Looks good, thanks. > --- > changelog > v2: > - Remove one confusing comment > - Change the _notrun message to focus on too small fs > --- > tests/btrfs/011 | 42 +++++++++++++++++++++++++++++++++++------- > 1 file changed, 35 insertions(+), 7 deletions(-) > > diff --git a/tests/btrfs/011 b/tests/btrfs/011 > index 89bb4d11..de424f87 100755 > --- a/tests/btrfs/011 > +++ b/tests/btrfs/011 > @@ -34,7 +34,7 @@ _cleanup() > kill -TERM $noise_pid > fi > wait > - rm -f $tmp.tmp > + rm -f $tmp.* > # we need this umount and couldn't rely on _require_scratch to umount > # it from next test, because we would replace SCRATCH_DEV, which is > # needed by _require_scratch, and make it umounted. > @@ -54,13 +54,17 @@ _require_scratch_dev_pool_equal_size > _require_command "$WIPEFS_PROG" wipefs > > rm -f $seqres.full > -rm -f $tmp.tmp > +rm -f $tmp.* > > echo "*** test btrfs replace" > > +# In seconds > +wait_time=1 > + > fill_scratch() > { > local fssize=$1 > + local filler_pid > > # Fill inline extents. > for i in `seq 1 500`; do > @@ -75,6 +79,30 @@ fill_scratch() > for i in `seq $fssize`; do > cp $SCRATCH_MNT/t0 $SCRATCH_MNT/t$i || _fail "cp failed" > done > /dev/null 2>> $seqres.full > + > + # Ensure we have enough data so that dev-replace would take at least > + # 2 * $wait_time, allowing we cancel the running replace. > + # Some extra points: > + # - Use XFS_IO_PROG instead of dd > + # fstests wraps dd, making it pretty hard to kill the real dd pid > + # - Use 64K block size with Direct IO > + # 64K is the same stripe size used in replace/scrub. Using Direct IO > + # ensure the IO speed is near device limit and comparable to replace > + # speed. > + $XFS_IO_PROG -f -d -c "pwrite -b 64k 0 1E" "$SCRATCH_MNT/t_filler" &>\ > + $tmp.filler_result & > + filler_pid=$! > + sleep $((2 * $wait_time)) > + kill -KILL $filler_pid &> /dev/null > + wait $filler_pid &> /dev/null > + > + # If the system is too fast and the fs is too small, then skip the test > + if grep -q "No space left" $tmp.filler_result; then > + ls -alh $SCRATCH_MNT >> $seqres.full > + cat $tmp.filler_result >> $seqres.full > + _notrun "fs too small for this test" > + fi > + cat $tmp.filler_result > sync; sync > } > > @@ -147,7 +175,7 @@ btrfs_replace_test() > if [ "${with_cancel}Q" = "cancelQ" ]; then > # background the replace operation (no '-B' option given) > _run_btrfs_util_prog replace start -f $replace_options $source_dev $target_dev $SCRATCH_MNT > - sleep 1 > + sleep $wait_time > _run_btrfs_util_prog replace cancel $SCRATCH_MNT > > # 'replace status' waits for the replace operation to finish > @@ -157,10 +185,10 @@ btrfs_replace_test() > grep -q canceled $tmp.tmp || _fail "btrfs replace status (canceled) failed" > else > if [ "${quick}Q" = "thoroughQ" ]; then > - # On current hardware, the thorough test runs > - # more than a second. This is a chance to force > - # a sync in the middle of the replace operation. > - (sleep 1; sync) > /dev/null 2>&1 & > + # The thorough test runs around 2 * $wait_time seconds. > + # This is a chance to force a sync in the middle of the > + # replace operation. > + (sleep $wait_time; sync) > /dev/null 2>&1 & > fi > _run_btrfs_util_prog replace start -Bf $replace_options $source_dev $target_dev $SCRATCH_MNT > > -- > 2.22.0 > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”