Re: [PATCH] generic/019: kill background processes on interrupt

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 18 Apr 2022 10:02:06 +1000

On Wed, Apr 13, 2022 at 10:13:35AM +0300, Amir Goldstein wrote:
> On Wed, Apr 13, 2022 at 4:53 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> > On Tue, Apr 12, 2022 at 10:25:00PM +0800, Zorro Lang wrote:
> > > On Tue, Apr 12, 2022 at 02:59:42PM +0200, David Disseldorp wrote:
> > > > On Mon, 11 Apr 2022 15:48:33 +1000, Dave Chinner wrote:
> > > >
> > > > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > > >
> > > > > If you ctrl-c generic/019, it leaves fsstress processes running.
> > > > > Kill them in the cleanup function so that they don't have to be
> > > > > manually killed after interrupting the test.
> > > > >
> > > > > While touching the _cleanup() function, make it do everything that
> > > > > the generic _cleanup function it overrides does and fix the
> > > > > indenting.
> > > > >
> > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > > > > ---
> > > > >  tests/generic/019 | 6 ++++--
> > > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/tests/generic/019 b/tests/generic/019
> > > > > index db56dac1..cda107f4 100755
> > > > > --- a/tests/generic/019
> > > > > +++ b/tests/generic/019
> > > > > @@ -53,8 +53,10 @@ stop_fail_scratch_dev()
> > > > >  # Override the default cleanup function.
> > > > >  _cleanup()
> > > > >  {
> > > > > -    disallow_fail_make_request
> > > > > -    rm -f $tmp.*
> > > > > + kill $fs_pid $fio_pid &> /dev/null
> > > > > + disallow_fail_make_request
> > > > > + cd /
> > > > > + rm -r -f $tmp.*
> > > > >  }
> > > > >
> > > > >  RUN_TIME=$((20+10*$TIME_FACTOR))
> > > >
> > > > Might be worth unset'ing the "fs_pid" and "fio_pid" variables after the
> > > > wait, but should be fine as-is:
> > >
> > > I agree. Better to avoid killing other system processes. Or how about this place
> > > does (avoid killing system useful processes):
> > > $KILLALL_PROG -q $FSSTRESS_PROG
> > > $KILLALL_PROG -q $FIO_PROG

In general I dislike the use of killall because because it is a
barrier to being able to run multiple tests in parallel on the same
machine. One of the things I'd really like to be able to do is run
tests in parallel and widespread use of killall is one of the
primary reasons we can't do that.

So, yeah, what one person sees as a "better method" another person
will see as having significant drawbacks and problems.

> > > Another picky question is, do we need to use a while loop checking, until the
> > > processes really get killed? :)
> >
> > Do we really need to paint the bikeshed over how best to kill a
> > process? I don't have time to do that, this is just a drive-by fix
> > that works for me....
> 
> This is not a kind response to reviewers.
> Does a "drive-by fix" get exempt from the review process?

No. But suggesting that the fix has to take into account unnecessary
and increasingly esoteric edge cases is not "review" - that's called
"moving the goal posts".

IOWs, my comment is based on the observation that a simple fix to a
problem that has been encountered during some other work gets turned
into a "fix the world" discussion that implies the original patch
author needs to fix the world rather than the single problem they
encountered and sent a patch for.

We talk about "drive by contributions" repeatedly in terms of how to
review/accept them without placing undue burdens on the people
sending such patches.  That fix might not be a 100% perfect fix but
it is still an improvement and it addresses that person's immediate
problem. That's largely all that matters to the person sending the
patch.

> The review comments are legit even if they could be dismissed
> on technical grounds, because the risk of pid wraparound is quite low.
>
> I don't think this is about "bikeshed over how best to kill a process"
> I think this is about how to have better test cleanup practices.
> It would have been nice to have better isolation by having fstests
> run a test without a control group and cleanup the control group
> processes after the test if someone wants to take on this task.

And now your are demonstrating my point - you've gone off into
bikeshedding over how to change the cleanup processes. This is
exactly what I pushed back against in the first place.

Review is not about turning a simple fix into a huge chunk of work
that the original patch author neither cares about nor has time or
resources to perform. It doesn't encourage them to send other simple
5 minute fixup patches as they encounter them, either.

Requiring patches to be perfect rather than just an *improvement* is
the problem here.  Turning a simple fix into a bikeshed session that
ends up going nowhere is in no-ones best interests.  Taking the fix
as it stands and then noting that there's a wider problem that needs
to be addressed (as Darrick has already described) is a much better
way forward. Address that bigger problem in it's own thread that
starts with patches that demonstrate solutions to that problem,
don't morph the "review" process into a discussion that implies that
the person who sent a simple fix is now on the hook to do a major
piece of cleanup work....

Cheers,

Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx