Re: [PATCH] btrfs/213: avoid occasional failure due to already finished balance

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Fri, May 19, 2023 at 6:05 AM Anand Jain <anand.jain@xxxxxxxxxx> wrote:
>
> On 18/5/23 19:08, fdmanana@xxxxxxxxxx wrote:
> > From: Filipe Manana <fdmanana@xxxxxxxx>
> >
> > btrfs/213 writes data, in 1M extents, for 4 seconds into a file, then
> > triggers a balance and then after 2 seconds it tries to cancel the
> > balance operation. More often than not, this works because the balance
> > is still running after 2 seconds. However it also fails sporadically
> > because balance has finished in less than 2 seconds, which is plausible
> > since data and metadata are cached or other factors such as virtualized
> > environment. When that's the case, it fails like this:
> >
> >    $ ./check btrfs/213
> >    FSTYP         -- btrfs
> >    PLATFORM      -- Linux/x86_64 debian0 6.4.0-rc1-btrfs-next-131+ #1 SMP PREEMPT_DYNAMIC Thu May 11 11:26:19 WEST 2023
> >    MKFS_OPTIONS  -- /dev/sdc
> >    MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
> >
> >    btrfs/213 51s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad)
> >        --- tests/btrfs/213.out        2020-06-10 19:29:03.822519250 +0100
> >        +++ /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad 2023-05-17 15:39:32.653727223 +0100
> >        @@ -1,2 +1,3 @@
> >         QA output created by 213
> >        +ERROR: balance cancel on '/home/fdmanana/btrfs-tests/scratch_1' failed: Not in progress
> >         Silence is golden
> >        ...
> >        (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/213.out /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad'  to see the entire diff)
> >    Ran: btrfs/213
> >    Failures: btrfs/213
> >    Failed 1 of 1 tests
> >
> > To make it much less likely that balance has already finished before we
> > try to cancel it, unmount and mount again the filesystem before starting
> > balance, to clear cached metadata and data, and also double the time we
> > spend writing 1M data extents. Also ignore when the balance failed because
> > it was already finished when we tried to cancel it.
> >
> > Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
> > ---
> >   tests/btrfs/213 | 13 ++++++++++---
> >   1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/tests/btrfs/213 b/tests/btrfs/213
> > index 8a10355c..cca0b3cc 100755
> > --- a/tests/btrfs/213
> > +++ b/tests/btrfs/213
> > @@ -28,7 +28,7 @@ _require_xfs_io_command pwrite -D
> >   _scratch_mkfs >> $seqres.full
> >   _scratch_mount
> >
> > -runtime=4
> > +runtime=8
> >
> >   # Create enough IO so that we need around $runtime seconds to relocate it.
> >   #
> > @@ -39,11 +39,18 @@ sleep $runtime
> >   kill $write_pid
> >   wait $write_pid
> >
> > +# Unmount and mount again the fs to clear any cached data and metadata, so that
> > +# it's less likely balance has already finished when we try to cancel it below.
> > +_scratch_cycle_mount
> > +
> >   # Now balance should take at least $runtime seconds, we can cancel it at
> >   # $runtime/2 to ensure a success cancel.
> >   _run_btrfs_balance_start -d --bg "$SCRATCH_MNT"
>
>
> > -sleep $(($runtime / 2))
> > -$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT"
> > +sleep $(($runtime / 4))
> > +# It's possible that balance has already completed. It's unlikely but often
> > +# it may happen due to virtualization, caching and other factors, so ignore
> > +# any error about no balance currently running.
> > +$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" 2>&1 | grep -iv 'not in progress'
>
> Cancel is an important step in this test case.
> Why not call _notrun() if the test case fails to make sure
> the balance is still in progress? This way, it provides
> another opportunity to fix.

Sounds reasonable. Sent a v2 with that.
Thanks.

>
> Thanks, Anand
>
> >
> >   # Now check if we can finish relocating metadata, which should finish very
> >   # quickly.
>




[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux