Connections getting stuck sending data to client

Chris Butler <chrisb@xxxxxxxxxxx> · Fri, 31 Oct 2008 15:36:35 +0000

I've been having intermittent problems with our DB server (running
postgresql 8.3.3) reaching its connection limit, all because of a SELECT
statement that's stuck while sending data. This gets stuck because there's a
transaction waiting to do an ALTER TABLE, then the subsequent SELECTs wait
for the ALTER.

The problem seems to be that the client connection drops while the server is
sending data to the client. On the server I end up with:

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address             State
[...]
tcp        1  17376 db.zedcore.com:postgres     portia.healthjobsuk.c:36520 CLOSE_WAIT

By the time our monitoring system has picked up the problem, netstat on the
client contains no trace of the offending port.

As you can see, the Send-Q is quite full. Also, if I strace the postgres
process, I get:

[root@db2:~]# strace -fp 13572
Process 13572 attached - interrupt to quit
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 1) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xf)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xf)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xf)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0

It appears that the server is blocking on the send operation, I guess
because the send buffer is already full. Also, I was unable to kill the
backend process with either SIGINT or TERM (possibly a bug?).

A tcpdump of traffic for that port:

14:34:07.975493 IP db.zedcore.com.postgres > portia.healthjobsuk.com.36520: . 2086302610:2086304058(1448) ack 381974125 win 78 <nop,nop,timestamp 3025282443 595580714>
14:34:07.976483 IP portia.healthjobsuk.com > db.zedcore.com: ICMP portia.healthjobsuk.com tcp port 36520 unreachable, length 556
14:36:07.960308 IP db.zedcore.com.postgres > portia.healthjobsuk.com.36520: . 0:1448(1448) ack 1 win 78 <nop,nop,timestamp 3025402443 595580714>
14:36:07.961565 IP portia.healthjobsuk.com > db.zedcore.com: ICMP portia.healthjobsuk.com tcp port 36520 unreachable, length 556

A bit of googling led me towards changing the TCP keepalive settings in
postgresql.conf, although I don't think they've made any difference.

Does anyone have any ideas what's happening, and whether there's anything I
can do to stop the problem?

-- 
Chris Butler
Zedcore Systems Ltd
UK tel: 0114 238 1828 ext 72

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general