On Tue, May 20, 2014 at 07:15:44PM -0700, Anand Avati wrote: > Niels, > This is a good addition. While gluster clients do a reasonably good job at > detecting dead/hung servers with ping-timeout, the server side detection > has been rather weak. TCP_KEEPALIVE has helped to some extent, for cases > where an idling client (which holds a lock) goes dead. However if an active > client with pending data in server's socket buffer dies, we have been > subject to long tcp retransmission to finish and give up. > > The way I see it, this option is complementary to TCP_KEEPALIVE (keepalive > works for idle and only idle connections, user_timeout works only when > there is pending acknowledgements, thus covering the full spectrum). To > that end, it might make sense to present the admin a single timeout > configuration value rather than two. It would be very frustrating for the > admin to configure one of them to, say, 30 seconds, and then find that the > server does not clean up after 30 seconds of a hung client only because the > connection was idle (or not idle). Configuring a second timeout for the > other case can be very unintuitive. > > In fact, I would suggest to have a single network timeout configuration, > which gets applied to all the three: ping-timeout on the client, > user_timeout on the server, keepalive on both. I think that is what a user > would be expecting anyways. Each is for a slightly different technical > situation, but all just internal details as far as a user is concerned. > > Thoughts? Sure, sounds good to me. I was thinking about using the network.ping-timeout option for the TCP_USER_TIMEOUT value too. Is that what you suggest, and applying that to the TCP_KEEPALIVE setting too? Thanks, Niels > > > On Tue, May 20, 2014 at 4:30 AM, Niels de Vos <ndevos@xxxxxxxxxx> wrote: > > > Hi all, > > > > the last few days I've been looking at a problem [1] where a client > > locks a file over a FUSE-mount, and a 2nd client tries to grab that lock > > too. It is expected that the 2nd client gets blocked until the 1st > > client releases the lock. This all work as long as the 1st client > > cleanly releases the lock. > > > > Whenever the 1st client crashes (like a kernel panic) or the network is > > split and the 1st client is unreachable, the 2nd client may not get the > > lock until the bricks detect that the connection to the 1st client is > > dead. If there are pending Replies, the bricks may need 15-20 minutes > > until the re-transmissions of the replies have timed-out. > > > > The current default of 15-20 minutes is quite long for a fail-over > > scenario. Relatively recently [2], the Linux kernel got > > a TCP_USER_TIMEOUT socket option (similar to TCP_KEEPALIVE). This option > > can be used to configure a per-socket timeout, instead of a system-wide > > configuration through the net.ipv4.tcp_retries2 sysctl. > > > > The default network.ping-timeout is set to 42 seconds. I'd like to > > propose a network.tcp-timeout option that can be set per volume. This > > option should then set TCP_USER_TIMEOUT for the socket, which causes > > re-transmission failures to be fatal after the timeout has passed. > > > > Now the remaining question, what shall be the default timeout in seconds > > for this new network.tcp-timeout option? I'm currently thinking of > > making it high enough (like 5 minutes) to prevent false positives. > > > > Thoughts and comments welcome, > > Niels > > > > > > 1 https://bugzilla.redhat.com/show_bug.cgi?id=1099460 > > 2 > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dca43c7 > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxxx > > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel