On Wed, Nov 27, 2024 at 03:51:32PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > Lots of tests run fsstress in the background and then have to kill > it and/or provide special cleanup functions to kill the background > fsstress processes. They typically use $KILLALL_PROG for this. > > Use of killall is problematic for running multiple tests in parallel > in that one test can kill other tests' processes. However, because > fsstress itself forks and runs children, there are very few avenues > for shell scripts to ensure all the fsstress processes actually die. > > With bash, it is especially nasty, because sending SIGTERM will > result in bash outputting error messages ("Killed: ..." that will > cause golden output mismatches and hence test failures. Hence we > also need to be able to tell the main fstress process to die without > triggering these messages. > > To avoid the process tracking problems, we change to use pkill > rather than killall (more options for process selection) and we > stop using the $here/ltp/fsstress binary. Instead, we copy the > $here/ltp/fsstress to $TEST_DIR/$seq.fsstress so that the test has > a unique fsstress binary name. This allows the pkill filter to > select just the fsstress processes the test has run. The fsstress > binary name is held in _FSSTRESS_NAME, and the program to run is > _FSSTRESS_PROG. > > We also track the primary fsstress process ID, and store that in > _FSSTRESS_PID. We do this so that we have a PID to wait against so > that we don't return before the fsstress processes are dead. To this > end, we add a SIGPIPE handler to the primary process so that it > dying doesn't trigger bash 'killed' message output. We can > send 'pkill -PIPE $_FSSTRESS_NAME' to all the fsstress processes and > the primary process will then enter the "wait for children to die" > processing loop before it exits. In this way, we can wait for the > primary fsstress process and when it exits we know that all it's > children have also finished and gone away. This makes killing > fsstress invocations reliable and noise free. > > This is accomplished by the helpers added to common/rc: > > _run_fsstress > _run_fsstress_bg > _wait_for_fsstress > _kill_fstress > > This also means that all fsstress invocations now obey > FSSTRESS_AVOID environment restrictions, many of which didn't. > > We add a call to _kill_fstress into the generic _cleanup() function. > This means that tests using fsstress don't need to add a special > local _cleanup function just to call _kill_fsstress() so that > background fsstress processes are killed when the user interrupts > the tests with ctrl-c. > > Further, killall in the _cleanup() function is often used to attempt > to expedite killing of foreground execution fsstress processes. This > doesn't actually work because of the way bash processes interupt > signals. That is, it waits for the currently executing process to > finish execution, then runs the trap function. Hence a foreground > fsstress won't ever be interrupted by ctrl-c. By implementing > _run_fsstress() as a background process and a wait call, the wait() > call is interrupted by the signal and the cleanup trap is run > immediately. Hence the fsstress processes are killed immediately and > the test exits cleanly almost immediately. > > The result of all this is common, clean handling of fsstress > execution and termination. There are a few exceptions for special > cases, but the vast majority of tests that run fsstress use the > above four wrapper functions exclusively. > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- [snip] > diff --git a/tests/btrfs/028 b/tests/btrfs/028 > index f64fc831d..85e42f31e 100755 > --- a/tests/btrfs/028 > +++ b/tests/btrfs/028 > @@ -32,8 +32,7 @@ args=`_scale_fsstress_args -z \ > -f fsync=10 -n 100000 -p 2 \ > -d $SCRATCH_MNT/stress_dir` > echo "Run fsstress $args" >>$seqres.full > -$FSSTRESS_PROG $args >>$seqres.full & > -fsstress_pid=$! > +_run_fsstress_bg $args > > echo "Start balance" >>$seqres.full > _btrfs_stress_balance -d $SCRATCH_MNT >/dev/null 2>&1 & > @@ -41,8 +40,7 @@ balance_pid=$! > > # 30s is enough to trigger bug > sleep $((30*$TIME_FACTOR)) > -kill $fsstress_pid &> /dev/null > -wait $fsstress_pid &> /dev/null > +_kill_fsstress > _btrfs_kill_stress_balance_pid $balance_pid I'll fix the conflict of this patch with: https://lore.kernel.org/fstests/20241129031846.rije5u7o5o64v6gv@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#m8174317696134bdc64d27c2f3012f7f843135f91 by below change [1], if you have more suggestions, feel free to tell me. Thanks, Zorro [1] diff --git a/tests/btrfs/028 b/tests/btrfs/028 index 4c6396ddb..b41e1077a 100755 --- a/tests/btrfs/028 +++ b/tests/btrfs/028 @@ -20,10 +20,7 @@ _cleanup() if [ ! -z "$balance_pid" ]; then _btrfs_kill_stress_balance_pid $balance_pid fi - if [ ! -z "$fsstress_pid" ]; then - kill $fsstress_pid &> /dev/null - wait $fsstress_pid &> /dev/null - fi + _kill_fsstress } . ./common/filter @@ -55,7 +52,7 @@ balance_pid=$! sleep $((30*$TIME_FACTOR)) _kill_fsstress _btrfs_kill_stress_balance_pid $balance_pid -unset fsstress_pid balance_pid +unset balance_pid