----- Original Message ----- > From: "Kaushal M" <kshlmster@xxxxxxxxx> > To: "Justin Clift" <justin@xxxxxxxxxxx>, "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > Sent: Thursday, May 22, 2014 6:04:29 PM > Subject: Re: bug-857330/normal.t failure > > Thanks Justin, I found the problem. The VM can be deleted now. > > Turns out, there was more than enough time for the rebalance to complete. But > we hit a race, which caused a command to fail. > > The particular test that failed is waiting for rebalance to finish. It does > this by doing a 'gluster volume rebalance <> status' command and checking > the result. The EXPECT_WITHIN function runs this command till we have a > match, the command fails or the timeout happens. > > For a rebalance status command, glusterd sends a request to the rebalance > process (as a brick_op) to get the latest stats. It had done the same in > this case as well. But while glusterd was waiting for the reply, the > rebalance completed and the process stopped itself. This caused the rpc > connection between glusterd and rebalance proc to close. This caused the all > pending requests to be unwound as failures. Which in turnlead to the command > failing. Do you think we can print the status of the process as 'not-responding' when such a thing happens, instead of failing the command? Pranith > > I cannot think of a way to avoid this race from within glusterd. For this > particular test, we could avoid using the 'rebalance status' command if we > directly checked the rebalance process state using its pid etc. I don't > particularly approve of this approach, as I think I used the 'rebalance > status' command for a reason. But I currently cannot recall the reason, and > if cannot come with it soon, I wouldn't mind changing the test to avoid > rebalance status. > > ~kaushal > > > > On Thu, May 22, 2014 at 5:22 PM, Justin Clift < justin@xxxxxxxxxxx > wrote: > > > > On 22/05/2014, at 12:32 PM, Kaushal M wrote: > > I haven't yet. But I will. > > > > Justin, > > Can I get take a peek inside the vm? > > Sure. > > IP: 23.253.57.20 > User: root > Password: foobar123 > > The stdout log from the regression test is in /tmp/regression.log. > > The GlusterFS git repo is in /root/glusterfs. Um, you should be > able to find everything else pretty easily. > > Btw, this is just a temp VM, so feel free to do anything you want > with it. When you're finished with it let me know so I can delete > it. :) > > + Justin > > > > ~kaushal > > > > > > On Thu, May 22, 2014 at 4:53 PM, Pranith Kumar Karampuri < > > pkarampu@xxxxxxxxxx > wrote: > > Kaushal, > > Rebalance status command seems to be failing sometimes. I sent a mail about > > such spurious failure earlier today. Did you get a chance to look at the > > logs and confirm that rebalance didn't fail and it is indeed a timeout? > > > > Pranith > > ----- Original Message ----- > > > From: "Kaushal M" < kshlmster@xxxxxxxxx > > > > To: "Pranith Kumar Karampuri" < pkarampu@xxxxxxxxxx > > > > Cc: "Justin Clift" < justin@xxxxxxxxxxx >, "Gluster Devel" < > > > gluster-devel@xxxxxxxxxxx > > > > Sent: Thursday, May 22, 2014 4:40:25 PM > > > Subject: Re: bug-857330/normal.t failure > > > > > > The test is waiting for rebalance to finish. This is a rebalance with > > > some > > > actual data so it could have taken a long time to finish. I did set a > > > pretty high timeout, but it seems like it's not enough for the new VMs. > > > > > > Possible options are, > > > - Increase this timeout further > > > - Reduce the amount of data. Currently this is 100 directories with 10 > > > files each of size between 10-500KB > > > > > > ~kaushal > > > > > > > > > On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri < > > > pkarampu@xxxxxxxxxx > wrote: > > > > > > > Kaushal has more context about these CCed. Keep the setup until he > > > > responds so that he can take a look. > > > > > > > > Pranith > > > > ----- Original Message ----- > > > > > From: "Justin Clift" < justin@xxxxxxxxxxx > > > > > > To: "Pranith Kumar Karampuri" < pkarampu@xxxxxxxxxx > > > > > > Cc: "Gluster Devel" < gluster-devel@xxxxxxxxxxx > > > > > > Sent: Thursday, May 22, 2014 3:54:46 PM > > > > > Subject: bug-857330/normal.t failure > > > > > > > > > > Hi Pranith, > > > > > > > > > > Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG" > > > > > mode (I think). > > > > > > > > > > One of the VM's had a failure in bug-857330/normal.t: > > > > > > > > > > Test Summary Report > > > > > ------------------- > > > > > ./tests/basic/rpm.t (Wstat: 0 Tests: 0 > > > > Failed: > > > > > 0) > > > > > Parse errors: Bad plan. You planned 8 tests but ran 0. > > > > > ./tests/bugs/bug-857330/normal.t (Wstat: 0 Tests: 24 > > > > Failed: > > > > > 1) > > > > > Failed test: 13 > > > > > Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr 1.73 sys + > > > > 941.82 > > > > > cusr 645.54 csys = 1591.22 CPU) > > > > > Result: FAIL > > > > > > > > > > Seems to be this test: > > > > > > > > > > COMMAND="volume rebalance $V0 status" > > > > > PATTERN="completed" > > > > > EXPECT_WITHIN 300 $PATTERN get-task-status > > > > > > > > > > Is this one on your radar already? > > > > > > > > > > Btw, this VM is still online. Can give you access to retrieve logs > > > > > if useful. > > > > > > > > > > + Justin > > > > > > > > > > -- > > > > > Open Source and Standards @ Red Hat > > > > > > > > > > twitter.com/realjustinclift > > > > > > > > > > > > > > _______________________________________________ > > > > Gluster-devel mailing list > > > > Gluster-devel@xxxxxxxxxxx > > > > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > > > > > > > > > > > -- > Open Source and Standards @ Red Hat > > twitter.com/realjustinclift > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel