Guys , since we are clear now that we are not behind NAT , so we can forget now about reducing the keep-alive time (/proc/sys/net/ipv4/tcp_keepalive_time) to a value less than the NAT timeout. But anyways , I learn something new :) . The most likely reason which Michael also agreed can be the high load on both the system . So, do you suggest now to enable to enable the ClientAliveInterval option . Also , since ClientAliveCountMax is enabled by default with a value of 3 , so probably I will keep the value of ClientAliveInterval less than 300 secs . I will probably keep it at 60 secs. So , the connection will dropout after 180 secs if there is no response . Also , somewhat strange , TCPKeepAlive option is disabled in our sshd_config file , not sure why . So , If ClientAliveInterval is enabled , can we can leave TCPKeepAlive disabled . Is our purpose will serve ? Thanks Zaman On Tue, Jun 8, 2010 at 9:49 PM, Glynn Clements <glynn@xxxxxxxxxxxxxxxxxx> wrote: > > query wrote: > >> > I can't see how this can be caused by load. If you haven't yet enabled >> > ClientAliveInterval, then the connection isn't being closed by sshd >> > but by the kernel, due to TCP keep-alives not being acknowledged. >> >> okay...that may be the cause . The client host was also busy because >> of which TCP keep-alive were not acknowledged. > > Load won't have any effect upon TCP keep-alives, as it's the kernel > which acknowledges keep-alive packets, not the user process. > > Keep-alive allows you to detect that a host is unreachable (e.g. > network failure, system crash, power failure, etc). It doesn't tell > you anything about an individual process. > >> > As Michal suggests, the most likely reason for this is a NAT timeout. >> > If you're using NAT, you probably want to set the keep-alive time >> > (/proc/sys/net/ipv4/tcp_keepalive_time) to a value less than the NAT >> > timeout. Even then, that will only work for programs which enable >> > keep-alive (ssh and sshd both do by default; this is controlled by the >> > TCPKeepAlive option). >> >> How to determine the value of NAT timeout . Is it at the host level or >> the device where NATing is implemented . > > The device which performs NAT. > >> I was able to find the keepalive timeout value at the host . >> >> ==== >> $ sudo sysctl -a | grep -i keepalive >> net.ipv4.tcp_keepalive_time = 7200 >> net.ipv4.tcp_keepalive_probes = 9 >> net.ipv4.tcp_keepalive_intvl = 75 >> ===== >> >> Most likely I am not behind NAT , I will confirm it tomorrow . If that >> is the case , then which should I consider to increase the timeout >> value. >> The kernel timeout value or implement either TCPKeepAlive option or >> the ClientAliveInterval interval . TCPKeepAlive option is somehow >> disabled in the sshd config file . Please clarify regarding this. > > TCPKeepAlive is enabled by default. But even if it's enabled, the > 2-hour wait before any keep-alives are sent typically won't be enough > to prevent NAT entries from expiring. > > Even the 5-minute interval between SSH keep-alives may be longer than > the NAT expiry time. Low-end router/modem devices with built-in NAT > seem base their default configuration on the assumption that you're > using HTTP from Win95 boxes, where a connection being idle for more > than 30 seconds usually means that the Win95 box has crashed. > > Another possibility is a really cheap ISP which uses (a heavily > oversubscribed pool of) dynamic IP addresses, which expire whenever > the connection is idle for more than a minute. > > -- > Glynn Clements <glynn@xxxxxxxxxxxxxxxxxx> > -- To unsubscribe from this list: send the line "unsubscribe linux-admin" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html