I've been having intermittent problems with our DB server (running postgresql 8.3.3) reaching its connection limit, all because of a SELECT statement that's stuck while sending data. This gets stuck because there's a transaction waiting to do an ALTER TABLE, then the subsequent SELECTs wait for the ALTER. The problem seems to be that the client connection drops while the server is sending data to the client. On the server I end up with: Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State [...] tcp 1 17376 db.zedcore.com:postgres portia.healthjobsuk.c:36520 CLOSE_WAIT By the time our monitoring system has picked up the problem, netstat on the client contains no trace of the offending port. As you can see, the Send-Q is quite full. Also, if I strace the postgres process, I get: [root@db2:~]# strace -fp 13572 Process 13572 attached - interrupt to quit sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted) --- SIGINT (Interrupt) @ 0 (0) --- rt_sigreturn(0x2) = 44 sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted) --- SIGINT (Interrupt) @ 0 (0) --- rt_sigreturn(0x2) = 44 sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 1) = ? ERESTARTSYS (To be restarted) --- SIGINT (Interrupt) @ 0 (0) --- rt_sigreturn(0x2) = 44 sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted) --- SIGINT (Interrupt) @ 0 (0) --- rt_sigreturn(0x2) = 44 sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted) --- SIGINT (Interrupt) @ 0 (0) --- rt_sigreturn(0x2) = 44 sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted) --- SIGTERM (Terminated) @ 0 (0) --- rt_sigreturn(0xf) = 44 sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted) --- SIGTERM (Terminated) @ 0 (0) --- rt_sigreturn(0xf) = 44 sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted) --- SIGTERM (Terminated) @ 0 (0) --- rt_sigreturn(0xf) = 44 sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0 It appears that the server is blocking on the send operation, I guess because the send buffer is already full. Also, I was unable to kill the backend process with either SIGINT or TERM (possibly a bug?). A tcpdump of traffic for that port: 14:34:07.975493 IP db.zedcore.com.postgres > portia.healthjobsuk.com.36520: . 2086302610:2086304058(1448) ack 381974125 win 78 <nop,nop,timestamp 3025282443 595580714> 14:34:07.976483 IP portia.healthjobsuk.com > db.zedcore.com: ICMP portia.healthjobsuk.com tcp port 36520 unreachable, length 556 14:36:07.960308 IP db.zedcore.com.postgres > portia.healthjobsuk.com.36520: . 0:1448(1448) ack 1 win 78 <nop,nop,timestamp 3025402443 595580714> 14:36:07.961565 IP portia.healthjobsuk.com > db.zedcore.com: ICMP portia.healthjobsuk.com tcp port 36520 unreachable, length 556 A bit of googling led me towards changing the TCP keepalive settings in postgresql.conf, although I don't think they've made any difference. Does anyone have any ideas what's happening, and whether there's anything I can do to stop the problem? -- Chris Butler Zedcore Systems Ltd UK tel: 0114 238 1828 ext 72 -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general