RE: [pynfs RFC PATCH] nfs4.0/testserver.py: don't return an error when tests fail

"Frank Filz" <ffilzlnx@xxxxxxxxxxxxxx> · Thu, 23 Feb 2023 14:43:20 -0800

> -----Original Message-----
> From: Mkrtchyan, Tigran [mailto:tigran.mkrtchyan@xxxxxxx]
> Sent: Thursday, February 23, 2023 11:38 AM
> To: Jeff Layton <jlayton@xxxxxxxxxx>
> Cc: Frank Filz <ffilzlnx@xxxxxxxxxxxxxx>; J. Bruce Fields <bfields@xxxxxxxxxxxx>;
> Dai Ngo <dai.ngo@xxxxxxxxxx>; linux-nfs <linux-nfs@xxxxxxxxxxxxxxx>
> Subject: Re: [pynfs RFC PATCH] nfs4.0/testserver.py: don't return an error when
> tests fail
> 
> 
> 
> ----- Original Message -----
> > From: "Jeff Layton" <jlayton@xxxxxxxxxx>
> > To: "Frank Filz" <ffilzlnx@xxxxxxxxxxxxxx>, "J. Bruce Fields"
> <bfields@xxxxxxxxxxxx>, "Dai Ngo" <dai.ngo@xxxxxxxxxx>
> > Cc: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx>
> > Sent: Thursday, 23 February, 2023 18:08:19
> > Subject: Re: [pynfs RFC PATCH] nfs4.0/testserver.py: don't return an error
> when tests fail
> 
> > On Thu, 2023-02-23 at 08:22 -0800, Frank Filz wrote:
> >> > From: Jeff Layton [mailto:jlayton@xxxxxxxxxx]
> >>
> >> > This script was originally changed in eb3ba0b60055 ("Have
> >> > testserver.py
> >> have
> >> > non-zero exit code if any tests fail"), but the same change wasn't
> >> > made to
> >> the
> >> > 4.1 testserver.py script.
> >> >
> >> > There also wasn't much explanation for it, and it makes it difficult
> >> > to
> >> tell
> >> > whether the test harness itself failed, or whether there was a
> >> > failure in
> >> a
> >> > requested test.
> >> >
> >> > Stop the 4.0 testserver.py from exiting with an error code when a
> >> > test
> >> fails, so
> >> > that a successful return means only that the test harness itself
> >> > worked,
> >> not that
> >> > every requested test passed.
> >> >
> >> > Cc: Frank Filz <ffilzlnx@xxxxxxxxxxxxxx>
> >> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> >> > ---
> >> >  nfs4.0/testserver.py | 2 --
> >> >  1 file changed, 2 deletions(-)
> >> >
> >> > I'm not sure about this one. I've worked around this in kdevops for
> >> > now,
> >> but it
> >> > would really be preferable if it worked this way, imo. If this isn't
> >> acceptable,
> >> > maybe we can add a new option that enables this behavior?
> >> >
> >> > Frank, what was the original rationale for eb3ba0b60055 ?
> >>
> >> We needed a way for CI to easily detect failure of pynfs. I'm not sure
> >> how
> >> helpful it is since Ganesha does fail some tests...
> >>
> >> It might be helpful to have some helpers for CI to use, or an option
> >> that
> >> causes pynfs to report in a way that's much easier for CI to determine
> >> if
> >> pynfs succeeded or not.
> >>
> >
> > That's exactly the issue I had when working with this for kdevops. It
> > runs testserver.py via ansible, and when it gets a non-zero exit code,
> > the run aborts without doing anything further. I have it ignoring the
> > return code from testserver.py for now, but that's not ideal since I
> > can't catch actual problems with the test harness.
> >
> > I have testserver.py output the result to JSON, and then analyze that to
> > see if anything failed. That also gives us what you were asking for in
> > your other email -- the ability to filter out known failures. Here's
> > what I have so far, but I'd like to expand it to highlight other
> > behavior changes:
> >
> > https://github.com/linux-
> kdevops/kdevops/blob/master/scripts/workflows/pynfs/check_pynfs_results.py
> >
> > It may make sense to move that script into pynfs itself.
> >
> > If there is CI that depends on this behavior, then I'd be interested to
> > hear how they are dealing with failed tests. Do they just not run the
> > tests that always fail?
> 
> 
> Same here... The test generates for us xunit report, thus error code is in the
> reporting and we always have to run it as:
> 
> ```
> ./testserver.py -v --noinit --xml="${WORKSPACE}/xunit-report-v41.xml"
> ${SUT}:${TEST_PATH} all $NFS41_INCLUDES $NFS41_EXCLUDES_OPTION ||
> true
> ```

OK, maybe we just need to revert this behavior. Honestly, I'm not sure that the Ganesha CI are in good shape so if this reversion prompts examination of why things aren't working as expected, well, then we just need to revisit those things.

I still would argue against changing the meaning of "all"...

Frank

> >
> >> Hmm, one thing that would help is to be able to flag a set of tests
> >> that
> >> should not constitute a CI failure (known errors) but we want to keep
> >> running them because of what they exercise, or to more readily detect
> >> that
> >> they have been fixed.
> 
> yeah, an option might do the job.
> 
> Tigran.
> 
> >>
> >
> > The right way to do that is the same way that xfstests works. You test
> > for the conditions being favorable for a test and then just skip it if
> > they aren't.
> >
> >> > diff --git a/nfs4.0/testserver.py b/nfs4.0/testserver.py index
> >> > f2c41568e5c7..4f4286daa657 100755
> >> > --- a/nfs4.0/testserver.py
> >> > +++ b/nfs4.0/testserver.py
> >> > @@ -387,8 +387,6 @@ def main():
> >> >
> >> >      if nfail < 0:
> >> >          sys.exit(3)
> >> > -    if nfail > 0:
> >> > -        sys.exit(2)
> >> >
> >> >  if __name__ == "__main__":
> >> >      main()
> >> > --
> >> > 2.39.2
> >>
> >
> > --
> > Jeff Layton <jlayton@xxxxxxxxxx>