Hi, Another setting to try, on the client side, is the ServerAliveInterval. This sets a keep alive packet to be sent within the SSH protocol, as opposed to TCPKeepAlive which is within the underlying TCP connection. I have had the misfortune to be behind firewalls that have harvested "dead" connections far too quickly, in my opinion, and this setting worked for me where TCPKeepAlive didn't. Worth a try. Cheers, Adam On 09/06/10 07:44, query wrote: > Guys , since we are clear now that we are not behind NAT , so we can > forget now about reducing the keep-alive time > (/proc/sys/net/ipv4/tcp_keepalive_time) to a value less than the NAT > timeout. But anyways , I learn something new :) . > The most likely reason which Michael also agreed can be the high load > on both the system . > > So, do you suggest now to enable to enable the ClientAliveInterval > option . Also , since ClientAliveCountMax is enabled by default with a > value of 3 , > so probably I will keep the value of ClientAliveInterval less than 300 > secs . I will probably keep it at 60 secs. So , the connection will > dropout after 180 secs if there is no response . > > Also , somewhat strange , TCPKeepAlive option is disabled in our > sshd_config file , not sure why . So , If ClientAliveInterval is > enabled , can we can leave TCPKeepAlive disabled . Is our purpose > will serve ? > > > Thanks > Zaman > > On Tue, Jun 8, 2010 at 9:49 PM, Glynn Clements <glynn@xxxxxxxxxxxxxxxxxx> wrote: >> >> query wrote: >> >>>> I can't see how this can be caused by load. If you haven't yet enabled >>>> ClientAliveInterval, then the connection isn't being closed by sshd >>>> but by the kernel, due to TCP keep-alives not being acknowledged. >>> >>> okay...that may be the cause . The client host was also busy because >>> of which TCP keep-alive were not acknowledged. >> >> Load won't have any effect upon TCP keep-alives, as it's the kernel >> which acknowledges keep-alive packets, not the user process. >> >> Keep-alive allows you to detect that a host is unreachable (e.g. >> network failure, system crash, power failure, etc). It doesn't tell >> you anything about an individual process. >> >>>> As Michal suggests, the most likely reason for this is a NAT timeout. >>>> If you're using NAT, you probably want to set the keep-alive time >>>> (/proc/sys/net/ipv4/tcp_keepalive_time) to a value less than the NAT >>>> timeout. Even then, that will only work for programs which enable >>>> keep-alive (ssh and sshd both do by default; this is controlled by the >>>> TCPKeepAlive option). >>> >>> How to determine the value of NAT timeout . Is it at the host level or >>> the device where NATing is implemented . >> >> The device which performs NAT. >> >>> I was able to find the keepalive timeout value at the host . >>> >>> ==== >>> $ sudo sysctl -a | grep -i keepalive >>> net.ipv4.tcp_keepalive_time = 7200 >>> net.ipv4.tcp_keepalive_probes = 9 >>> net.ipv4.tcp_keepalive_intvl = 75 >>> ===== >>> >>> Most likely I am not behind NAT , I will confirm it tomorrow . If that >>> is the case , then which should I consider to increase the timeout >>> value. >>> The kernel timeout value or implement either TCPKeepAlive option or >>> the ClientAliveInterval interval . TCPKeepAlive option is somehow >>> disabled in the sshd config file . Please clarify regarding this. >> >> TCPKeepAlive is enabled by default. But even if it's enabled, the >> 2-hour wait before any keep-alives are sent typically won't be enough >> to prevent NAT entries from expiring. >> >> Even the 5-minute interval between SSH keep-alives may be longer than >> the NAT expiry time. Low-end router/modem devices with built-in NAT >> seem base their default configuration on the assumption that you're >> using HTTP from Win95 boxes, where a connection being idle for more >> than 30 seconds usually means that the Win95 box has crashed. >> >> Another possibility is a really cheap ISP which uses (a heavily >> oversubscribed pool of) dynamic IP addresses, which expire whenever >> the connection is idle for more than a minute. >> >> -- >> Glynn Clements <glynn@xxxxxxxxxxxxxxxxxx> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-admin" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature