Re: Volume ping-timeout parameter and client side mount timeouts

Mohammed Rafi K C <rkavunga@xxxxxxxxxx> · Tue, 15 Nov 2016 15:22:14 +0530

If I understand the query correctly, the problem is that gluster takes
more than 20seconds to timeout even though the brick was offline for
more than 35s. With that assumptions I have some

How did you understand that the timer has expired after 35s only, by log
file? If so glusterfs wait  some time to flush the logs to console to
push it as batch, not sure how long. So the actual timing in the logs
may not be accurate.

If you have already confirmed that by using a wireshark or similar tools
that it takes more than 20seconds to disconnect the socket, then there
could be some thing else which we need to look into.

Can you conform that using wireshark or similar tools if not already done.

Rafi KC

On 11/14/2016 09:13 PM, Martin Schlegel wrote:
> Hello Gluster Community
>
> We have 2x brick nodes running with replication for a volume gv0 for which set a
> "gluster volume set gv0 ping-timeout 20".
>
> In our tests it seemed there is unknown delay with this ping-timeout - we see it
> timing out much later after about 35 seconds and not at around 20 seconds (see
> test below).
>
> Our distributed database cluster is using Gluster as a secondary file system for
> backups etc. - it's Pacemaker cluster manager needs to know how long to wait
> before giving up on the glusterfs mounted file system to become available again
> or when to failover to another node.
>
> 1. When do we know when to give up waiting on the glusterfs mount point to
> become accessible again following an outage on the brick server this client was
> connected to ?
> 2. Is there a timeout / interval setting on the client side that we could
> reduce, so that it more quickly tries to switch the mount point to a different,
> available brick server ?
>
>
> Regards,
> Martin Schlegel
>
> __________
>
> Here is how we tested this:
>
> As a test we blocked the entire network on one of these brick nodes:
> root@glusterfs-brick-node1 $ date;iptables -A INPUT -i bond0 -j DROP ; iptables
> -A OUTPUT -o bond0 -j DROP
> Mon Nov 14 08:26:55 UTC 2016
>
> From the syslog on the glusterfs-client-node
> Nov 14 08:27:30 glusterfs-client-node1 pgshared1[26783]: [2016-11-14
> 08:27:30.275694] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired]
> 0-gv0-client-0: server glusterfs-brick-node1:49152 has not responded in the last
> 20 seconds, disconnecting.
>
> <--- This last message "has not responded in the last 20 seconds" is confusing
> to me, because the brick node was clearly blocked for 35 seconds already ! Is
> there some client-side check interval that can be reduced ?
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users