On Wed, May 20, 2020 at 12:32 PM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: > > > > On 2020/5/20 下午6:02, Qu Wenruo wrote: > > > > > > On 2020/5/20 下午5:29, Filipe Manana wrote: > >> On Wed, May 20, 2020 at 8:59 AM Qu Wenruo <wqu@xxxxxxxx> wrote: > >>> > >>> Test if canceling a running balance can cause later balance to dead > >>> loop. > >>> > >>> The ifx is titled "btrfs: relocation: Clear the DEAD_RELOC_TREE bit for > >>> orphan roots to prevent runaway balance". > >>> > >>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> > >>> --- > >>> Changelog: > >>> v2: > >>> - Remove lsof debug output > >>> v3: > >>> - Remove ps debug output > >>> --- > >>> tests/btrfs/213 | 64 +++++++++++++++++++++++++++++++++++++++++++++ > >>> tests/btrfs/213.out | 2 ++ > >>> tests/btrfs/group | 1 + > >>> 3 files changed, 67 insertions(+) > >>> create mode 100755 tests/btrfs/213 > >>> create mode 100644 tests/btrfs/213.out > >>> > >>> diff --git a/tests/btrfs/213 b/tests/btrfs/213 > >>> new file mode 100755 > >>> index 00000000..f56b0279 > >>> --- /dev/null > >>> +++ b/tests/btrfs/213 > >>> @@ -0,0 +1,64 @@ > >>> +#! /bin/bash > >>> +# SPDX-License-Identifier: GPL-2.0 > >>> +# Copyright (C) 2020 SUSE Linux Products GmbH. All Rights Reserved. > >>> +# > >>> +# FS QA Test 213 > >>> +# > >>> +# Test if canceling a running balance can lead to dead looping balance > >>> +# > >>> +seq=`basename $0` > >>> +seqres=$RESULT_DIR/$seq > >>> +echo "QA output created by $seq" > >>> + > >>> +here=`pwd` > >>> +tmp=/tmp/$$ > >>> +status=1 # failure is the default! > >>> +trap "_cleanup; exit \$status" 0 1 2 3 15 > >>> + > >>> +_cleanup() > >>> +{ > >>> + cd / > >>> + rm -f $tmp.* > >>> +} > >>> + > >>> +# get standard environment, filters and checks > >>> +. ./common/rc > >>> +. ./common/filter > >>> + > >>> +# remove previous $seqres.full before test > >>> +rm -f $seqres.full > >>> + > >>> +# Modify as appropriate. > >>> +_supported_fs btrfs > >>> +_supported_os Linux > >>> +_require_scratch > >>> +_require_command "$KILLALL_PROG" killall > >>> + > >>> +_scratch_mkfs >> $seqres.full > >>> +_scratch_mount > >>> + > >>> +runtime=4 > >>> + > >>> +# Create enough IO so that we need around $runtime seconds to relocate it > >>> +dd if=/dev/zero bs=1M of="$SCRATCH_MNT/file" oflag=sync status=none \ > >>> + &> /dev/null & > >>> +dd_pid=$! > >>> +sleep $runtime > >>> +"$KILLALL_PROG" -q dd &> /dev/null > >> > >> Do we really need the killall program? There's only one dd process. > >> > >> We should also kill the dd process at _cleanup(), as killing the test > >> during the sleep above will result in the dd process not being killed. > > > > The main problem here is, I can't find a good way to kill dd. > > plain 'kill $dd_pid' doesn't seem to kill it properly, as my debug > > lsof/ps still shows dd running and failed to unmount the fs. > > > > 'kill -KILL $dd_pid' kills it well, but causes extra output for the > > terminated dd. > > > > Only 'killall -q dd' works as expected. > > > > Any good advice on this? > > Oh, the fstests "kindly" overrides dd, "kill $dd_pid" only kills the > child bash running that dd function, not the real dd command. > I'll use xfs_io to ensure we get no extra wrapper. Indeed, I was doing this in the test: ps -eo pid,ppid,pgid,comm > /tmp/ps.txt my_pid=$$ echo "dd_pid = $dd_pid mypid = $my_pid" >> /tmp/ps.txt kill -9 $dd_pid wait $dd_pid And was noticing that the ps.txt file had: 23467 937 23467 check ... 23840 23467 23467 213 ... 24034 23840 23467 213 24039 24034 23467 dd ... dd_pid = 24034 mypid = 23840 So dd_pid matched a bash process running the test, which is a child of the bash process that invoked dd. Before digging further I saw your reply and checked that common/rc provides a function named dd() which is a wrapper to dd and it invokes dd twice, whence you needed killall. Interesting, wasn't aware of that wrapper. Thanks. > > Thanks, > Qu > > > > >> > >>> +wait $dd_pid > >>> + > >>> +# Now balance should take at least $runtime seconds, we can cancel it at > >>> +# $runtime/2 to ensure a success cancel. > >>> +$BTRFS_UTIL_PROG balance start -d --bg "$SCRATCH_MNT" > >>> +sleep $(($runtime / 2)) > >>> +$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" > >>> + > >>> +# Now check if we can finish relocating metadata, which should finish very > >>> +# quickly > >>> +$BTRFS_UTIL_PROG balance start -m "$SCRATCH_MNT" >> $seqres.full > >> > >> Why redirect this one to $seqres.full and not the other balance? What > >> kind of useful information it provides? > > > > Not really much, just an indicator to show that the balance finishes as > > expected. > > And we don't want to output it golden output, as mkfs change may create > > more metadata chunks and cause difference. > > > > For other ones, like the one to be canceled, we don't really care that much. > > > > Thanks, > > Qu > >> > >>> + > >>> +echo "Silence is golden" > >>> + > >>> +# success, all done > >>> +status=0 > >>> +exit > >>> diff --git a/tests/btrfs/213.out b/tests/btrfs/213.out > >>> new file mode 100644 > >>> index 00000000..bd8f2430 > >>> --- /dev/null > >>> +++ b/tests/btrfs/213.out > >>> @@ -0,0 +1,2 @@ > >>> +QA output created by 213 > >>> +Silence is golden > >>> diff --git a/tests/btrfs/group b/tests/btrfs/group > >>> index 8d65bddd..fe4d5bb3 100644 > >>> --- a/tests/btrfs/group > >>> +++ b/tests/btrfs/group > >>> @@ -215,3 +215,4 @@ > >>> 210 auto quick qgroup snapshot > >>> 211 auto quick log prealloc > >>> 212 auto balance dangerous > >>> +213 auto fast balance dangerous > >> > >> fast -> quick > >> > >> Thanks. > >> > >>> -- > >>> 2.26.2 > >>> > >> > >> > > > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”