Re: errors with high connections rate

Craig Ringer <ringerc@xxxxxxxxxxxxx> · Tue, 03 Jul 2012 21:27:24 +0800

On 07/03/2012 04:26 PM, Pawel S. Veselov wrote:

That's the thing, no segfaults (dmesg), nothing in the server logs.

It may as well be some sort of an anti-fork-bomb measure, only judging 
by the fact that with enough attempts, things do clear out, though I 
wish there would be some indication of that, and I'm still confused 
about the error code being ENOTCONN.

I've managed to produce the endpoint not connected errors with a little 
test I wrote here. Only once so far and only during an abnormal test run 
where I signalled the test workers as they were starting up, so that's 
not really very helpful.

I have no problem using a little Python test program to create 800 
connections in about a second. It forks some workers (100 by default) 
which grab enough connections each to reach the target connection count.

Ooh, handy. I just triggered it again now. The "Transport endpoint is 
not connected" messages were intermixed with some "FATAL:  sorry, too 
many clients already" messages. The PostgreSQL log is full of FATAL:  
sorry, too many clients already" messages intermixed with "LOG:  
unexpected EOF on client connection" messages. Again it was an abnormal 
run where I signalled my workers mid way through startup.

Interesting, that. I've never seen it on a run where I don't send a 
signal. You know what that makes me think? You're using a multithreaded 
approach, and there's something going wrong in your app's innards. Yes, 
that's a lot of hot air and handwaving, but it fits - you're getting an 
error saying that psql is trying to operate on a socket that isn't there.

The fact that there's nothing in the system logs or Pg logs just adds 
weight to that. I'm guessing you have a threading bug, possibly signal 
related.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general