Re: [PATCH] generic/019: kill background processes on interrupt

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Wed, Apr 13, 2022 at 10:13:35AM +0300, Amir Goldstein wrote:
> On Wed, Apr 13, 2022 at 4:53 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> > On Tue, Apr 12, 2022 at 10:25:00PM +0800, Zorro Lang wrote:
> > > On Tue, Apr 12, 2022 at 02:59:42PM +0200, David Disseldorp wrote:
> > > > On Mon, 11 Apr 2022 15:48:33 +1000, Dave Chinner wrote:
> > > >
> > > > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > > >
> > > > > If you ctrl-c generic/019, it leaves fsstress processes running.
> > > > > Kill them in the cleanup function so that they don't have to be
> > > > > manually killed after interrupting the test.
> > > > >
> > > > > While touching the _cleanup() function, make it do everything that
> > > > > the generic _cleanup function it overrides does and fix the
> > > > > indenting.
> > > > >
> > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > > > > ---
> > > > >  tests/generic/019 | 6 ++++--
> > > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/tests/generic/019 b/tests/generic/019
> > > > > index db56dac1..cda107f4 100755
> > > > > --- a/tests/generic/019
> > > > > +++ b/tests/generic/019
> > > > > @@ -53,8 +53,10 @@ stop_fail_scratch_dev()
> > > > >  # Override the default cleanup function.
> > > > >  _cleanup()
> > > > >  {
> > > > > -    disallow_fail_make_request
> > > > > -    rm -f $tmp.*
> > > > > + kill $fs_pid $fio_pid &> /dev/null
> > > > > + disallow_fail_make_request
> > > > > + cd /
> > > > > + rm -r -f $tmp.*
> > > > >  }
> > > > >
> > > > >  RUN_TIME=$((20+10*$TIME_FACTOR))
> > > >
> > > > Might be worth unset'ing the "fs_pid" and "fio_pid" variables after the
> > > > wait, but should be fine as-is:
> > >
> > > I agree. Better to avoid killing other system processes. Or how about this place
> > > does (avoid killing system useful processes):
> > > $KILLALL_PROG -q $FSSTRESS_PROG
> > > $KILLALL_PROG -q $FIO_PROG
> > >
> > > Another picky question is, do we need to use a while loop checking, until the
> > > processes really get killed? :)
> >
> > Do we really need to paint the bikeshed over how best to kill a
> > process? I don't have time to do that, this is just a drive-by fix
> > that works for me....
> >
> 
> This is not a kind response to reviewers.
> Does a "drive-by fix" get exempt from the review process?
> The review comments are legit even if they could be dismissed
> on technical grounds, because the risk of pid wraparound is quite low.
> 
> I don't think this is about "bikeshed over how best to kill a process"
> I think this is about how to have better test cleanup practices.

I agree, but this is a broad treewide cleanup, which itself is a
separate project that shouldn't hold up this quick cleanup...

> It would have been nice to have better isolation by having fstests
> run a test without a control group and cleanup the control group
> processes after the test if someone wants to take on this task.

...because there are quite a few places (particularly anything that runs
fsx/fsstress/iogen for fun) where we kick off a group of background
processes and later require a reliable way to shoot them all down.
Fixing all that in a consistent way is a *much* bigger task than what
Dave is trying to accomplish here.

The current "scheme" is that ./check will run each test in its own
systemd scope (if available) to try to improve the reliability of test
program cleanup if the _cleanup method itself fails to kill all the
child tasks.  This isn't foolproof because some people refuse to use
systemd, and the systemd tools themselves can't do a whole lot about
processes stuck in D state.

In the ideal world, whoever takes on cleaning up process cleanup
probably ought to figure out a more general solution, or at least
investigate it more thoroughly than I did to decide if it's worth
reimplementing process control group control via bash script for people
who do not use systemd.

Does anyone want to take on this task?

> I personally prefer the pattern of dedicated cleanup trap for aborting the test
> like generic/251 that leaves the generic _cleanup on EXIT instead of
> duplicating _cleanup (which generic/251 also duplicate incorrectly),
> but no strong feeling about that, so as a "drive-by fix" you may add:
> 
> Reviewed-by: Amir Goldstein <amir73il@xxxxxxxxx>

For this patch,
Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>

--D

> 
> Thanks,
> Amir.



[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux