Re: keep alive and running query

desmodemone <desmodemone@xxxxxxxxx> · Mon, 16 Dec 2013 14:16:04 +0100

2013/12/16 Albe Laurenz <laurenz.albe@xxxxxxxxxx>

desmodemone wrote:

> 2013/12/16 Albe Laurenz <laurenz.albe@xxxxxxxxxx>

>> desmodemone wrote:

>> As I tested and saw until now, the keep alive functions as follow [if I understand correctly and it's

>> not a bug] :

>>

>> When a connection it's in idle state or in idle in transaction, if the connection with client it's

>> broken  for a number of keep alive, the backend will be terminated.

>>

>>

>> By the way if this could be ok in an OLTP enviroment, because the average time of a query is << the

>> time of keep alive,  in a DWH enviroment could be a problem.

>>

>> Imagine your application server, where there is an ETL, will go down for 1 minute and your

>> transactions are still running on the DWH database, that transactions could run for hours before the

>>

>> keep alive will terminate them, because they are in transaction state and not idle or idle in

>> transaction.

> TCP keepalive will also terminate a session that is currently

> stuck in a long running SQL query if the client end dies.

>

> I think that your problem is that you mix up different meanings of "idle".

>

> In PostgreSQL, a connection is idle (or idle in transaction) if processing

> of the last command is finished and the server is waiting for the next

> command from the client.

>

> In TCP, a connection is idle if there is no network traffic.

>                          so I have a strange behavior on some test servers.

>

> I am using Centos 6.4 and I set up the tcp keep alive kernel parameter very low to see the effects

> [the postgresql.conf parameter have 0 value so use the OS value] :

>

> net.ipv4.tcp_keepalive_intvl = 2

> net.ipv4.tcp_keepalive_probes = 2

> net.ipv4.tcp_keepalive_time = 1

>

>

> then I run a long query from a remote client and I kill that "psql client" . By the way, even even if

> after long time, the query remains alive,

>

> until it finishes and it returns error because could not return the row to the client.

>

>

> If I do the same with an update, it's the same except become idle in transaction and only after then

> it's killed .

>

>

> So, if " TCP keepalive will also terminate a session that is currently stuck in a long running SQL

> query if the client end dies."  ,

> what is  wrong on the setup ? Could someone try the tcp keep alive or explain why is not working as

> expected  ?

That's as expected; the problem is that I was imprecise and misleading.

Not only does "idle" stand for two things, but also "connection".

While TCP keepalive will terminate a TCP connection after a while

if it detects that the remote end does no longer respond, that does

not imply that the PostgreSQL session is immediately terminated.

That only happens when PostgreSQL tries to read or write on the socket

belonging to the dead TCP connection.

So a long running query will happily continue until it is ready to

write to the socket and then detect that the socket is no longer there.

In the case of an idle session, the server keeps trying to read from

the socket and will notice when it goes away.

So in effect setting low keepalive limits will only make TCP connections

get removed quickly, which will have an effect on idle database sessions,

but not on sessions where the server does not try to communicate with

the client.

Yours,

Laurenz Albe

Hi Laurenz,

thank you for your answer, I suspect how so and now it's more clear now how 
it's working.
By the way, the 
problem, as I show , it's for DWH or similar environment where it's 
usually to have a long running query (with sort / group by / hash , so before the first rows
is returned, it needs time) and in those cases , if the client will die, the backend will run for a lot of time before the backend will try to write to the socket.

So
 it's possible to have a heavy load in a database server with a lot of 
death connections where the backends are still working as described.
I think it's not a minor problem, no?

Kind Regards

Mat