Re: bug-857330/normal.t failure

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Thu, 22 May 2014 22:04:11 -0400 (EDT)

----- Original Message -----
> From: "Kaushal M" <kshlmster@xxxxxxxxx>
> To: "Justin Clift" <justin@xxxxxxxxxxx>, "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> Sent: Thursday, May 22, 2014 6:04:29 PM
> Subject: Re:  bug-857330/normal.t failure
> 
> Thanks Justin, I found the problem. The VM can be deleted now.
> 
> Turns out, there was more than enough time for the rebalance to complete. But
> we hit a race, which caused a command to fail.
> 
> The particular test that failed is waiting for rebalance to finish. It does
> this by doing a 'gluster volume rebalance <> status' command and checking
> the result. The EXPECT_WITHIN function runs this command till we have a
> match, the command fails or the timeout happens.
> 
> For a rebalance status command, glusterd sends a request to the rebalance
> process (as a brick_op) to get the latest stats. It had done the same in
> this case as well. But while glusterd was waiting for the reply, the
> rebalance completed and the process stopped itself. This caused the rpc
> connection between glusterd and rebalance proc to close. This caused the all
> pending requests to be unwound as failures. Which in turnlead to the command
> failing.

Do you think we can print the status of the process as 'not-responding' when such a thing happens, instead of failing the command?

Pranith

> 
> I cannot think of a way to avoid this race from within glusterd. For this
> particular test, we could avoid using the 'rebalance status' command if we
> directly checked the rebalance process state using its pid etc. I don't
> particularly approve of this approach, as I think I used the 'rebalance
> status' command for a reason. But I currently cannot recall the reason, and
> if cannot come with it soon, I wouldn't mind changing the test to avoid
> rebalance status.
> 
> ~kaushal
> 
> 
> 
> On Thu, May 22, 2014 at 5:22 PM, Justin Clift < justin@xxxxxxxxxxx > wrote:
> 
> 
> 
> On 22/05/2014, at 12:32 PM, Kaushal M wrote:
> > I haven't yet. But I will.
> > 
> > Justin,
> > Can I get take a peek inside the vm?
> 
> Sure.
> 
> IP: 23.253.57.20
> User: root
> Password: foobar123
> 
> The stdout log from the regression test is in /tmp/regression.log.
> 
> The GlusterFS git repo is in /root/glusterfs. Um, you should be
> able to find everything else pretty easily.
> 
> Btw, this is just a temp VM, so feel free to do anything you want
> with it. When you're finished with it let me know so I can delete
> it. :)
> 
> + Justin
> 
> 
> > ~kaushal
> > 
> > 
> > On Thu, May 22, 2014 at 4:53 PM, Pranith Kumar Karampuri <
> > pkarampu@xxxxxxxxxx > wrote:
> > Kaushal,
> > Rebalance status command seems to be failing sometimes. I sent a mail about
> > such spurious failure earlier today. Did you get a chance to look at the
> > logs and confirm that rebalance didn't fail and it is indeed a timeout?
> > 
> > Pranith
> > ----- Original Message -----
> > > From: "Kaushal M" < kshlmster@xxxxxxxxx >
> > > To: "Pranith Kumar Karampuri" < pkarampu@xxxxxxxxxx >
> > > Cc: "Justin Clift" < justin@xxxxxxxxxxx >, "Gluster Devel" <
> > > gluster-devel@xxxxxxxxxxx >
> > > Sent: Thursday, May 22, 2014 4:40:25 PM
> > > Subject: Re:  bug-857330/normal.t failure
> > > 
> > > The test is waiting for rebalance to finish. This is a rebalance with
> > > some
> > > actual data so it could have taken a long time to finish. I did set a
> > > pretty high timeout, but it seems like it's not enough for the new VMs.
> > > 
> > > Possible options are,
> > > - Increase this timeout further
> > > - Reduce the amount of data. Currently this is 100 directories with 10
> > > files each of size between 10-500KB
> > > 
> > > ~kaushal
> > > 
> > > 
> > > On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri <
> > > pkarampu@xxxxxxxxxx > wrote:
> > > 
> > > > Kaushal has more context about these CCed. Keep the setup until he
> > > > responds so that he can take a look.
> > > > 
> > > > Pranith
> > > > ----- Original Message -----
> > > > > From: "Justin Clift" < justin@xxxxxxxxxxx >
> > > > > To: "Pranith Kumar Karampuri" < pkarampu@xxxxxxxxxx >
> > > > > Cc: "Gluster Devel" < gluster-devel@xxxxxxxxxxx >
> > > > > Sent: Thursday, May 22, 2014 3:54:46 PM
> > > > > Subject: bug-857330/normal.t failure
> > > > > 
> > > > > Hi Pranith,
> > > > > 
> > > > > Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG"
> > > > > mode (I think).
> > > > > 
> > > > > One of the VM's had a failure in bug-857330/normal.t:
> > > > > 
> > > > > Test Summary Report
> > > > > -------------------
> > > > > ./tests/basic/rpm.t (Wstat: 0 Tests: 0
> > > > Failed:
> > > > > 0)
> > > > > Parse errors: Bad plan. You planned 8 tests but ran 0.
> > > > > ./tests/bugs/bug-857330/normal.t (Wstat: 0 Tests: 24
> > > > Failed:
> > > > > 1)
> > > > > Failed test: 13
> > > > > Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr 1.73 sys +
> > > > 941.82
> > > > > cusr 645.54 csys = 1591.22 CPU)
> > > > > Result: FAIL
> > > > > 
> > > > > Seems to be this test:
> > > > > 
> > > > > COMMAND="volume rebalance $V0 status"
> > > > > PATTERN="completed"
> > > > > EXPECT_WITHIN 300 $PATTERN get-task-status
> > > > > 
> > > > > Is this one on your radar already?
> > > > > 
> > > > > Btw, this VM is still online. Can give you access to retrieve logs
> > > > > if useful.
> > > > > 
> > > > > + Justin
> > > > > 
> > > > > --
> > > > > Open Source and Standards @ Red Hat
> > > > > 
> > > > > twitter.com/realjustinclift
> > > > > 
> > > > > 
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel@xxxxxxxxxxx
> > > > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> > > > 
> > > 
> > 
> 
> --
> Open Source and Standards @ Red Hat
> 
> twitter.com/realjustinclift
> 
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel