Re: How to cope with spurious regression failures

Atin Mukherjee <amukherj@xxxxxxxxxx> · Tue, 19 Jan 2016 20:02:42 +0530



On 01/19/2016 07:08 PM, Raghavendra Talur wrote:
> 
> 
> On Tue, Jan 19, 2016 at 5:21 PM, Atin Mukherjee <amukherj@xxxxxxxxxx
> <mailto:amukherj@xxxxxxxxxx>> wrote:
> 
> 
> 
>     On 01/19/2016 10:45 AM, Emmanuel Dreyfus wrote:
>     > Hi
>     >
>     > Spurious regression failures make developers frustrated. One submits a
>     > change and gets completely unrelated failures. The only way out is to
>     > retrigger regression until it passes, a boring and time-wasting task.
>     > Sometimes after 4 or 5 failed runs, the submitter realize there is a
>     > real issue and look at it, which is a waste of time and resources.
>     >
>     > The fact that we run regression on multiple platforms makes the
>     > situation worse. If you have 10% of chances to hit a spurious
>     failure on
>     > Linux and a 20% chances to hit a spurious failure on NetBSD (random
>     > number chosen), that means you get roughtly a failure for four
>     > submissions (random prediction, as I used random input numbers,
>     but you
>     > get the idea)
>     >
>     > Two solutions are proposed:
>     >
>     > 1) do not run unreliable tests, as proposed by Raghavendra Talur:
>     > http://review.gluster.org/13173
>     >
>     > I have nothing against the idea, but I voted down the change
>     because it
>     > fails to address the need for different test blacklists on different
>     > platforms: we do not have the same unreliable tests on Linux and
>     NetBSD.
> 
> 
> Why I prefer having this solution:
> a. Allowing re-running to tests to make them pass leads to complacency
> with how tests are written.
> b. A test is bad if it is not deterministic and running a bad test has
> *no* value. We are wasting time even if the test runs for a few seconds.
IMHO, most of our tests are non-deterministic and that's why my vote
would be for option 2 over 1 as that reduces the probability of retriggers.
> c. I propose another method to overcome the technical difficulty of
> having blacklists for different platforms. We could have "[K[a-z]*-]*"
> as prefix of tests where [a-z]* could be L or N signify that the test is
> bad on Linux and NetBSD respectively. The run-tests.sh script can be
> made intelligent enough to determine host OS and skip them.
> 
>  
> 
>     >
>     > 2) add a regression option to retry a failed test once, and to
>     validate
>     > the regression if second attempt passes, as I proposed:
>     > http://review.gluster.org/13245
>     >
>     > The idea is basicaly to automatically do what every submitter has been
>     > doing: retry without a thought when regression fails. The benefit of
>     > this approach is also that it gives us a better view of what test
>     failed
>     > because of the change, and what test failed because it was unreliable.
>     >
>     > The retry feature is optionnal and triggered by using the -r flag to
>     > run-tests.sh. I intend to use it on NetBSD regression to reduce the
>     > number of failures that annoy people. It could be used on Linux
>     > regression too, though I do not plan to touch that on my own.
>     +1 to option 2
>     >
>     > Please people tell us what approach you prefer.
>     >
>     _______________________________________________
>     Gluster-devel mailing list
>     Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
>     http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel