Re: Mount hangs because of connection delays

Ravishankar N <ravishankar@xxxxxxxxxx> · Thu, 02 Jul 2015 22:54:47 +0530

On 07/02/2015 07:04 PM, Pranith Kumar Karampuri wrote:
hi,
    When glusterfs mount process is coming up all cluster xlators wait 
for at least one event from all the children before propagating the 
status upwards. Sometimes client xlator takes upto 2 minutes to 
propogate this 
event(https://bugzilla.redhat.com/show_bug.cgi?id=1054694#c0) Due to 
this xavi implemented timer in ec notify where we treat a child as 
down if it doesn't come up in 10 seconds. Similar patch went up for 
review @http://review.gluster.org/#/c/11113 for afr. Kritika raised an 
interesting point in the review that all cluster xlators need to have 
this logic for the mount to not hang, and the correct place to fix it 
would be client xlator itself. i.e. add the timer logic in client 
xlator. Which seems like a better approach.

I think it makes sense to handle the change only in relevant cluster 
xlators like AFR/EC because of the notion of high availability 
associated with them. In my limited understanding, protocol-client is 
the originator (?) of the child up/down events. While it looks okay to 
allow cluster xlators to take certain decisions because the 'originator' 
did not respond within a specific time, altering the originator itself 
without giving a chance to the upper xlators to make choices seems 
incorrect to me.  Perhaps I'm wrong, but setting an unconditional 10 
second timer on protocol/client seems to beat the purpose of having a 
configurable `network.ping-timeout` volume set option.

Just my two cents. :)

I just want to take inputs from everyone before we go ahead in that 
direction.
i.e. on PARENT_UP in client xlator it will start a timer and if no rpc 
notification is received in that timeout it treats the client xlator 
as down.

Pranith

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel