Re: Spurious disconnections / connectivity loss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Stephan von Krawczynski wrote:
On Fri, 29 Jan 2010 18:41:10 +0000
Gordan Bobic <gordan@xxxxxxxxxx> wrote:

I'm seeing things like this in the logs, coupled with things locking up for a while until the timeout is complete:

[2010-01-29 18:29:01] E [client-protocol.c:415:client_ping_timer_expired] home2: Server 10.2.0.10:6997 has not responded in the last 42 seconds, disconnecting. [2010-01-29 18:29:01] E [client-protocol.c:415:client_ping_timer_expired] home2: Server 10.2.0.10:6997 has not responded in the last 42 seconds, disconnecting.

The thing is, I know for a fact that there is no network outage of any sort. All the machines are on a local gigabit ethernet, and there is no connectivity loss observed anywhere else. ssh sessions going to the machines that are supposedly "not responding" remain alive and well, with no lag.

What you're seeing here is exactly what made us increase the ping-timeout to
120.
To us it is obvious that the keep alive strategy does not cope with minimal
packet loss. On _every_ network you can see packet loss (read the docs of your
switch carefully). We had the impression that the strategy implemented is not
aware of the fact that a lost ping packet is no proof for a disconnected
server but only a hint for a closer look.

It sounds like there needs to be more heartbeats/minute. 1 packet per 10 seconds might be a good figure to start with, but I cannot see that even 1 packet / second would be harmful unless the number of nodes starts to get very large, and disconnection should be triggered only after some threshold number (certainly > 1) of those get lost in a row. Are there options to tune such parameters in the volume spec file?

Gordan




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux