Re: bug-857330/normal.t failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
From: "Krishnan Parthasarathi" <kparthas@xxxxxxxxxx>
To: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
Sent: Friday, May 23, 2014 9:31:27 AM
Subject: Re:  bug-857330/normal.t failure


----- Original Message -----
> On 22/05/2014, at 1:34 PM, Kaushal M wrote:
> > Thanks Justin, I found the problem. The VM can be deleted now.
> 
> Done. :)
> 
> 
> > Turns out, there was more than enough time for the rebalance to complete.
> > But we hit a race, which caused a command to fail.
> > 
> > The particular test that failed is waiting for rebalance to finish. It does
> > this by doing a 'gluster volume rebalance <> status' command and checking
> > the result. The EXPECT_WITHIN function runs this command till we have a
> > match, the command fails or the timeout happens.
> > 
> > For a rebalance status command, glusterd sends a request to the rebalance
> > process (as a brick_op) to get the latest stats. It had done the same in
> > this case as well. But while glusterd was waiting for the reply, the
> > rebalance completed and the process stopped itself. This caused the rpc
> > connection between glusterd and rebalance proc to close. This caused the
> > all pending requests to be unwound as failures. Which in turnlead to the
> > command failing.
> > 
> > I cannot think of a way to avoid this race from within glusterd. For this
> > particular test, we could avoid using the 'rebalance status' command if we
> > directly checked the rebalance process state using its pid etc. I don't
> > particularly approve of this approach, as I think I used the 'rebalance
> > status' command for a reason. But I currently cannot recall the reason,
> > and if cannot come with it soon, I wouldn't mind changing the test to
> > avoid rebalance status.
> 

I think its the rebalance daemon's life cycle which is problematic. It makes it
inconvenient, if not impossible, for glusterd to gather progress/status deterministically.
The rebalance process could wait for the rebalance-commit subcommand to terminate.
There is no other daemon, managed by glusterd, has this kind of life cycle.
I don't see any good reason why rebalance should kill itself on completion
of data migration.

Thoughts?

~Krish


Agree with Krish here. Making rebalance process to reply and exit seems to be the best option.

Raghavendra Talur

> Hmmm, is it the kind of thing where the "rebalance status" command
> should retry, if it's connection gets closed by a just-completed-
> rebalance (as happened here)?
> 
> Or would that not work as well?
> 
> + Justin
> 
> --
> Open Source and Standards @ Red Hat
> 
> twitter.com/realjustinclift
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

-- 
Thanks! 
Raghavendra Talur | Red Hat Storage Developer | Bangalore | +918039245176 

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux