query wrote: > > I can't see how this can be caused by load. If you haven't yet enabled > > ClientAliveInterval, then the connection isn't being closed by sshd > > but by the kernel, due to TCP keep-alives not being acknowledged. > > okay...that may be the cause . The client host was also busy because > of which TCP keep-alive were not acknowledged. Load won't have any effect upon TCP keep-alives, as it's the kernel which acknowledges keep-alive packets, not the user process. Keep-alive allows you to detect that a host is unreachable (e.g. network failure, system crash, power failure, etc). It doesn't tell you anything about an individual process. > > As Michal suggests, the most likely reason for this is a NAT timeout. > > If you're using NAT, you probably want to set the keep-alive time > > (/proc/sys/net/ipv4/tcp_keepalive_time) to a value less than the NAT > > timeout. Even then, that will only work for programs which enable > > keep-alive (ssh and sshd both do by default; this is controlled by the > > TCPKeepAlive option). > > How to determine the value of NAT timeout . Is it at the host level or > the device where NATing is implemented . The device which performs NAT. > I was able to find the keepalive timeout value at the host . > > ==== > $ sudo sysctl -a | grep -i keepalive > net.ipv4.tcp_keepalive_time = 7200 > net.ipv4.tcp_keepalive_probes = 9 > net.ipv4.tcp_keepalive_intvl = 75 > ===== > > Most likely I am not behind NAT , I will confirm it tomorrow . If that > is the case , then which should I consider to increase the timeout > value. > The kernel timeout value or implement either TCPKeepAlive option or > the ClientAliveInterval interval . TCPKeepAlive option is somehow > disabled in the sshd config file . Please clarify regarding this. TCPKeepAlive is enabled by default. But even if it's enabled, the 2-hour wait before any keep-alives are sent typically won't be enough to prevent NAT entries from expiring. Even the 5-minute interval between SSH keep-alives may be longer than the NAT expiry time. Low-end router/modem devices with built-in NAT seem base their default configuration on the assumption that you're using HTTP from Win95 boxes, where a connection being idle for more than 30 seconds usually means that the Win95 box has crashed. Another possibility is a really cheap ISP which uses (a heavily oversubscribed pool of) dynamic IP addresses, which expire whenever the connection is idle for more than a minute. -- Glynn Clements <glynn@xxxxxxxxxxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-admin" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html