On 2019/9/18 下午3:54, Anand Jain wrote: > On 18/9/19 2:56 PM, Qu Wenruo wrote: >> [BUG] >> When btrfs/011 is executed on a fast enough system (fully memory backed >> VM, with test device has unsafe cache mode), the test can fail like >> this: >> >> btrfs/011 43s ... [failed, exit status 1]- output mismatch (see >> /home/adam/xfstests-dev/results//btrfs/011.out.bad) >> --- tests/btrfs/011.out 2019-07-22 14:13:44.643333326 +0800 >> +++ /home/adam/xfstests-dev/results//btrfs/011.out.bad >> 2019-09-18 14:49:28.308798022 +0800 >> @@ -1,3 +1,4 @@ >> QA output created by 011 >> *** test btrfs replace >> -*** done >> +failed: '/usr/bin/btrfs replace cancel /mnt/scratch' >> +(see /home/adam/xfstests-dev/results//btrfs/011.full for details) >> ... >> >> [CAUSE] >> Looking into the full output, it shows: >> ... >> Replace from /dev/mapper/test-scratch1 to /dev/mapper/test-scratch2 >> >> # /usr/bin/btrfs replace start -f /dev/mapper/test-scratch1 >> /dev/mapper/test-scratch2 /mnt/scratch >> # /usr/bin/btrfs replace cancel /mnt/scratch >> INFO: ioctl(DEV_REPLACE_CANCEL)"/mnt/scratch": not started >> failed: '/usr/bin/btrfs replace cancel /mnt/scratch' >> >> So this means the replace is already finished before we cancel it. >> For fast system, it's very common. >> >> [FIX] >> Instead of using _run_btrfs_util_prog which requires 0 as return value, >> we just call "$BTRFS_UTIL_PROG replace cancel" and ignore all its >> stderr/stdout, and completely rely on "$BTRFS_UTIL_PROG replace status" >> output to verify the work. >> >> Furthermore if we finished replac before cancelling it, we should >> replace again to switch the device back, or after the test case, btrfs >> check will fail as there is no valid btrfs on that replaced device. >> >> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> >> --- >> tests/btrfs/011 | 16 ++++++++++++++-- >> 1 file changed, 14 insertions(+), 2 deletions(-) >> >> diff --git a/tests/btrfs/011 b/tests/btrfs/011 >> index 89bb4d11..858b00e8 100755 >> --- a/tests/btrfs/011 >> +++ b/tests/btrfs/011 >> @@ -148,13 +148,25 @@ btrfs_replace_test() >> # background the replace operation (no '-B' option given) >> _run_btrfs_util_prog replace start -f $replace_options >> $source_dev $target_dev $SCRATCH_MNT >> sleep 1 >> - _run_btrfs_util_prog replace cancel $SCRATCH_MNT >> + # 1s is enough for fast system to finish replace, so here we >> + # ignore all the output, completely rely on later status >> + # output to determine >> + $BTRFS_UTIL_PROG replace cancel $SCRATCH_MNT &> /dev/null >> # 'replace status' waits for the replace operation to finish >> # before the status is printed >> $BTRFS_UTIL_PROG replace status $SCRATCH_MNT > $tmp.tmp 2>&1 >> cat $tmp.tmp >> $seqres.full >> - grep -q canceled $tmp.tmp || _fail "btrfs replace status >> (canceled) failed" >> + grep -q -e canceled -e finished $tmp.tmp ||\ >> + _fail "btrfs replace status (canceled) failed" >> + >> + # If replace finished before cancel, replace them back or >> + # the final fsck after test case will fail as there is no btrfs >> + # on the $source_dev anymore >> + if grep -q -e finished $tmp.tmp ; then >> + $BTRFS_UTIL_PROG replace start -Bf $replace_options \ >> + $target_dev $source_dev $SCRATCH_MNT >> + fi >> else >> if [ "${quick}Q" = "thoroughQ" ]; then >> # On current hardware, the thorough test runs >> > > Faced the same problem before. But I don't have a good solution > to fix. Because the idea of the test case appears to test the replace > cancel. This change makes it not to error if replace is not running > in the first place. Then what about _notrun if we find that replace/scrub finished too early? BTW, if we really want to test all regular scrub/replace with cancel/finish combination, we should split this test into 4 test cases, then we can completely afford to skip 2 while still allow the finish cases to run... Thanks, Qu > > Thanks, Anand >
Attachment:
signature.asc
Description: OpenPGP digital signature