On 07/02/2015 07:04 PM, Pranith Kumar Karampuri wrote:
hi,
When glusterfs mount process is coming up all cluster xlators wait
for at least one event from all the children before propagating the
status upwards. Sometimes client xlator takes upto 2 minutes to
propogate this
event(https://bugzilla.redhat.com/show_bug.cgi?id=1054694#c0) Due to
this xavi implemented timer in ec notify where we treat a child as
down if it doesn't come up in 10 seconds. Similar patch went up for
review @http://review.gluster.org/#/c/11113 for afr. Kritika raised an
interesting point in the review that all cluster xlators need to have
this logic for the mount to not hang, and the correct place to fix it
would be client xlator itself. i.e. add the timer logic in client
xlator. Which seems like a better approach.
I think it makes sense to handle the change only in relevant cluster
xlators like AFR/EC because of the notion of high availability
associated with them. In my limited understanding, protocol-client is
the originator (?) of the child up/down events. While it looks okay to
allow cluster xlators to take certain decisions because the 'originator'
did not respond within a specific time, altering the originator itself
without giving a chance to the upper xlators to make choices seems
incorrect to me. Perhaps I'm wrong, but setting an unconditional 10
second timer on protocol/client seems to beat the purpose of having a
configurable `network.ping-timeout` volume set option.
Just my two cents. :)
I just want to take inputs from everyone before we go ahead in that
direction.
i.e. on PARENT_UP in client xlator it will start a timer and if no rpc
notification is received in that timeout it treats the client xlator
as down.
Pranith
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel