Re: Postgres Disconnection problems

Scott Marlowe <smarlowe@xxxxxxxxxxxxxxxxx> · Mon, 21 Nov 2005 08:58:36 -0600

On Fri, 2005-11-18 at 18:51, Otto Blomqvist wrote:
> Hi,
> 
> We are using PostgresDAC 2.2.1 and PostgreSQL 8.0.2 on
> i386-redhat-linux-gnu, compiled by GCC i386-redhat-linux-gcc (GCC) 4.0.0
> 20050412 (Red Hat 4.0.0-0.42).
> 
> I perform a simple test as follows.
> 
> 1. I connect to the database, which is located on a LAN.
> 
> 2. I simulate Internet problems by unplugging the Ethernet cable of the
> client. There is no PSQL activity going on.
> 
> 3. Plug the ethernet cable back in
> 
> 4. Run some sql, which gives me a Postgres SQL error -1, Server closed
> connection unexpectedly
> 
> So far so good. Problem is that the postmaster does not detect this
> connection as dead and keeps it idle for an unknown amount of time. This is
> a real problem for us because we use persistent connections to authorize
> access to a custom built 68030 based system, which has a limited number of
> "slots" that we can use. By not releasing a dead connection we are also
> holding that 68030 slot busy.

The real issue here is that TCP keepalive keeps the connection alive for
a long time.  The default on linux boxen is 2+hours.  In our production
environment, we dropped tcp_keepalive to 5 minutes.  There are four
settings in the linux kernel:

net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 500

the keepalive_time tells the kernel how long to wait to "ping" a
connection after it's gone quiet.  The probes and intvl tell it how many
times to try and re-awaken it and how long to wait between each. So,
with the settings shown above, a dead connection will wait 8.3 minutes,
then execute a "ping" (not really a ping, it's on a lower level than a
real ping would be) and then will wait 1.25 minutes and do it 9 times. 
So, in this scenario, an idle connection left from a network failure
will take just under 20 minutes to clear.