Re: spurious regression failures again!

Joseph Fernandes <josferna@xxxxxxxxxx> · Thu, 17 Jul 2014 01:28:14 -0400 (EDT)

Hi Avra,

Just clarifying things here,
1) When testing with the setup provide by Justin, I found the only place where bug-1112559.t failed was after the failure mgmt_v3-locks.t in the previous regression run. The mail attached with the previous mail was just an OBSERVATION and NOT an INFERENCE that failure of mgmt_v3-locks.t was the root cause of bug-1112559.t . I am NOT jumping the gun and making any statement/conclusion here. Its just an OBSERVATION. And thanks for the clarification on why mgmt_v3-locks.t is failing.

2) I agree with you that the cleanup script needs to kill all gluster* processes. And its also true that port range used by gluster for bricks is unique.
But bug-1112559.t fails only because of the unavailability of port, to start the snap brick. Therefore this suggests that there is some process(gluster or non-gluster)
still using the port. 

3) And Finally that bug-1112559.t failing individually all the time is not true as when looked into the links which you have provided there are case where there are previous other test case failures, on the same testing machine (slave26). By this I am not pointing out that those failure are the root cause for bug-1112559.t to fail 
As stated earlier its a notable OBSERVATION(Keeping in mind point 2 about ports and cleanup)

I have run nearly 30 runs on slave30 and only one time bug-1112559.t failed (As stated in point 1). I am continuing to run more runs. The only problem is the occurrence of bug-1112559.t failure is spurious and there is no deterministic way of reproducing it. 

Will keep all posted about the results.

Regards,
Joe

----- Original Message -----
From: "Avra Sengupta" <asengupt@xxxxxxxxxx>
To: "Joseph Fernandes" <josferna@xxxxxxxxxx>, "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Varun Shastry" <vshastry@xxxxxxxxxx>, "Justin Clift" <justin@xxxxxxxxxxx>
Sent: Wednesday, July 16, 2014 1:03:21 PM
Subject: Re:  spurious regression failures again!

Joseph,

I am not sure I understand how this is affecting the spurious failure of 
bug-1112559.t. As per the mail you have attached, and according to your 
analysis,  bug-1112559.t fails because a cleanup hasn't happened 
properly after a previous test-case failed and in your case there was a 
crash as well.

Now out of all the times bug-1112559.t has failed, most of the time it's 
the only test case failing and there isn't any crash. Below are the 
regression runs that pranith had sent for the same.

http://build.gluster.org/job/rackspace-regression-2GB/541/consoleFull

http://build.gluster.org/job/rackspace-regression-2GB-triggered/173/consoleFull

http://build.gluster.org/job/rackspace-regression-2GB-triggered/172/consoleFull

http://build.gluster.org/job/rackspace-regression-2GB/543/console

In all of the above bug-1112559.t is the only test case that fails and 
there is no crash.

So what I fail to understand here is, if this particular testcase fails 
independently as well as with other testcases, then how can we conclude 
that any other testcase failing is somehow not doing a cleanup properly 
and that is the reason for bug-1112559.t failing.

mgmt_v3-locks.t fails because glusterd takes more time to register a 
node going down, and hence the peer status doesn't return what the 
testcase expects it to. It's a race. The testcase ends with a cleanup 
routine like every other testcase, that kills all gluster and glusterfsd 
processes, which might be using any brick ports. So could you please 
explain how or which process still uses the brick ports that the snap 
bricks are trying to use leading to the failure of bug-1112559.t.

Regards,
Avra

On 07/15/2014 09:57 PM, Joseph Fernandes wrote:
> Just pointing out ,
>
> 2) tests/basic/mgmt_v3-locks.t - Author: Avra
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/375/consoleFull
>
> This is the similar kind of error I saw in my testing of spurious failure tests/bugs/bug-1112559.t
>
> Please refer the attached mail.
>
> Regards,
> Joe
>
>
>
> ----- Original Message -----
> From: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
> To: "Joseph Fernandes" <josferna@xxxxxxxxxx>
> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Varun Shastry" <vshastry@xxxxxxxxxx>
> Sent: Tuesday, July 15, 2014 9:34:26 PM
> Subject: Re:  spurious regression failures again!
>
>
> On 07/15/2014 09:24 PM, Joseph Fernandes wrote:
>> Hi Pranith,
>>
>> Could you please share the link of the console output of the failures.
> Added them inline. Thanks for reminding :-)
>
> Pranith
>> Regards,
>> Joe
>>
>> ----- Original Message -----
>> From: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
>> To: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Varun Shastry" <vshastry@xxxxxxxxxx>
>> Sent: Tuesday, July 15, 2014 8:52:44 PM
>> Subject:  spurious regression failures again!
>>
>> hi,
>>        We have 4 tests failing once in a while causing problems:
>> 1) tests/bugs/bug-1087198.t - Author: Varun
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/379/consoleFull
>> 2) tests/basic/mgmt_v3-locks.t - Author: Avra
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/375/consoleFull
>> 3) tests/basic/fops-sanity.t - Author: Pranith
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/383/consoleFull
>> Please take a look at them and post updates.
>>
>> Pranith
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel@xxxxxxxxxxx
>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel