Re: good job on fixing heavy hitters in spurious regressions

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Fri, 08 May 2015 22:57:24 +0530

On 05/08/2015 06:45 PM, Shyam wrote:
On 05/08/2015 08:16 AM, Jeff Darcy wrote:
Here are some of the things that I can think of: 0) Maintainers
should also maintain tests that are in their component.

It is not possible for me as glusterd co-maintainer to 'maintain'
tests that are added under tests/bugs/glusterd. Most of them don't
test core glusterd functionality.  They are almost always tied to a
particular feature whose implementation had bugs in its glusterd code.
I would expect the test authors (esp. the more recent ones) to chip
in.

Good point.  Nobody should be penalized for having code that everyone
else touches (or rewarded for having code that nobody dares to).
First responsibility for debugging a regression-test failure lies
with the owner of the patch that failed.  If they determine that the
failure is spurious - which is easy if it's already on a list - then
responsibility falls to the owner of the test.  Either should be able
to draw on the expertise of others in the group, but that doesn't
shift *responsibility*.  Only when a problem has been tracked down to
a particular piece of production code should responsibility move
again - either to the person whose earlier patch caused the breakage,
or to the subsystem maintainer.

+1, could not have said it better, but chipping in my vote for the 
order, owner of patch first -> owner of test -> maintainer of module, 
with the said responsibility in place.
I don't know man, this order failed me so many times in the past with 
spurious failures for both 3.6.0 (Sent around 15 patches at that time to 
fix spurious failures) and now for 3.7.0. also had to spend reasonable 
amount of time including the long weekend keeping away from ec 
deliverables for this release. I am very happy that lot more people 
helped solve the problems this time around because I feel they 
experienced the pain of screwed up regressions and realize the 
importance of fixing them.
I'll tell you what experiences lead me to suggest that the maintainer 
take this responsibility.

I submit a patch for new-component/changing log-level of one of the logs 
for which there is not a single caller after you moved it from INFO -> 
DEBUG. So the code is not at all going to be executed. Yet the 
regressions will fail. I am 100% sure it has nothing to do with my 
patch. I neither have time nor expertise to debug the test that I have 
no clue about, so the least I can do is to intimate people who may do 
something about it i.e. owner of test or maintainer of module. You feel 
lets ask the owner of the test about what the problem is, owner of the 
test moves on to different component and is busy with their own work. So 
you are left with going to the maintainer who tells you so and so is the 
problem and so and so is the reason as soon as you show the test number, 
so you end up feeling why didn't I ask him/her first.

Mostly this is just common sense.  Perhaps the change that's needed
is to make the fixing of likely-spurious test failures a higher
priority than adding new features.  That has to be reflected not
only in Bugzilla, but also in how we schedule individual developers'
time and evaluate their progress toward goals.
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel