Re: Spurious disconnections / connectivity loss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 29 Jan 2010 18:41:10 +0000
Gordan Bobic <gordan@xxxxxxxxxx> wrote:

> I'm seeing things like this in the logs, coupled with things locking up 
> for a while until the timeout is complete:
> 
> [2010-01-29 18:29:01] E 
> [client-protocol.c:415:client_ping_timer_expired] home2: Server 
> 10.2.0.10:6997 has not responded in the last 42 seconds, disconnecting.
> [2010-01-29 18:29:01] E 
> [client-protocol.c:415:client_ping_timer_expired] home2: Server 
> 10.2.0.10:6997 has not responded in the last 42 seconds, disconnecting.
> 
> The thing is, I know for a fact that there is no network outage of any 
> sort. All the machines are on a local gigabit ethernet, and there is no 
> connectivity loss observed anywhere else. ssh sessions going to the 
> machines that are supposedly "not responding" remain alive and well, 
> with no lag.

What you're seeing here is exactly what made us increase the ping-timeout to
120.
To us it is obvious that the keep alive strategy does not cope with minimal
packet loss. On _every_ network you can see packet loss (read the docs of your
switch carefully). We had the impression that the strategy implemented is not
aware of the fact that a lost ping packet is no proof for a disconnected
server but only a hint for a closer look.

-- 
Regards,
Stephan





[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux