Re: Preventing SSH timeouts . Some clarification needed

Glynn Clements <glynn@xxxxxxxxxxxxxxxxxx> · Tue, 8 Jun 2010 17:19:18 +0100

query wrote:

> > I can't see how this can be caused by load. If you haven't yet enabled
> > ClientAliveInterval, then the connection isn't being closed by sshd
> > but by the kernel, due to TCP keep-alives not being acknowledged.
> 
> okay...that may be the cause . The client host was also busy because
> of which TCP keep-alive were not acknowledged.

Load won't have any effect upon TCP keep-alives, as it's the kernel
which acknowledges keep-alive packets, not the user process.

Keep-alive allows you to detect that a host is unreachable (e.g. 
network failure, system crash, power failure, etc). It doesn't tell
you anything about an individual process.

> > As Michal suggests, the most likely reason for this is a NAT timeout.
> > If you're using NAT, you probably want to set the keep-alive time
> > (/proc/sys/net/ipv4/tcp_keepalive_time) to a value less than the NAT
> > timeout. Even then, that will only work for programs which enable
> > keep-alive (ssh and sshd both do by default; this is controlled by the
> > TCPKeepAlive option).
> 
> How to determine the value of NAT timeout . Is it at the host level or
> the device where NATing is implemented .

The device which performs NAT.

> I was able to find the keepalive timeout value at the host .
> 
> ====
> $ sudo sysctl -a | grep -i keepalive
> net.ipv4.tcp_keepalive_time = 7200
> net.ipv4.tcp_keepalive_probes = 9
> net.ipv4.tcp_keepalive_intvl = 75
> =====
> 
> Most likely I am not behind NAT , I will confirm it tomorrow . If that
> is the case , then which should I consider to increase the timeout
> value.
> The kernel timeout value or implement either TCPKeepAlive option or
> the ClientAliveInterval interval . TCPKeepAlive option is somehow
> disabled in the sshd config file .  Please clarify regarding this.

TCPKeepAlive is enabled by default. But even if it's enabled, the
2-hour wait before any keep-alives are sent typically won't be enough
to prevent NAT entries from expiring.

Even the 5-minute interval between SSH keep-alives may be longer than
the NAT expiry time. Low-end router/modem devices with built-in NAT
seem base their default configuration on the assumption that you're
using HTTP from Win95 boxes, where a connection being idle for more
than 30 seconds usually means that the Win95 box has crashed.

Another possibility is a really cheap ISP which uses (a heavily
oversubscribed pool of) dynamic IP addresses, which expire whenever
the connection is idle for more than a minute.

-- 
Glynn Clements <glynn@xxxxxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html