Re: pgbench: could not connect to server: Resource temporarily unavailable

Andrew Dunstan <andrew@xxxxxxxxxxxx> · Sun, 21 Aug 2022 17:26:55 -0400

On 2022-08-21 Su 17:15, Tom Lane wrote:
> Andrew Dunstan <andrew@xxxxxxxxxxxx> writes:
>> On 2022-08-20 Sa 23:20, Tom Lane wrote:
>>> Kevin McKibbin <kevinmckibbin123@xxxxxxxxx> writes:
>>>> What's limiting my DB from allowing more connections?
>> The first question in my mind from the above is where this postgres
>> instance is actually listening. Is it really /var/run/postgresql? Its
>> postmaster.pid will tell you. I have often seen client programs pick up
>> a system libpq which is compiled with a different default socket directory.
> I wouldn't think that'd explain a symptom of some connections succeeding
> and others not within the same pgbench run.

Oh, yes, I agree, I missed that aspect of it.

>
> I tried to duplicate this behavior locally (on RHEL8) and got something
> interesting.  After increasing the server's max_connections to 1000,
> I can do
>
> $ pgbench -S -c 200 -j 100 -t 100 bench
>
> and it goes through fine.  But:
>
> $ pgbench -S -c 200 -j 200 -t 100 bench
> pgbench (16devel)
> starting vacuum...end.
> pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5440" failed: Resource temporarily unavailable
>         Is the server running locally and accepting connections on that socket?
> pgbench: error: could not create connection for client 154
>
> So whatever is triggering this has nothing to do with the server,
> but with how many threads are created inside pgbench.  I notice
> also that sometimes it works, making it seem like possibly a race
> condition.  Either that or there's some limitation on how fast
> threads within a process can open sockets.
>
> Also, I determined that libpq's connect() call is failing synchronously
> (we get EAGAIN directly from the connect() call, not later).  I wondered
> if libpq should accept EAGAIN as a synonym for EINPROGRESS, but no:
> that just makes it fail on the next touch of the socket.
>
> The only documented reason for connect(2) to fail with EAGAIN is
>
>        EAGAIN Insufficient entries in the routing cache.
>
> which seems pretty unlikely to be the issue here, since all these
> connections are being made to the same local address.
>
> On the whole this is smelling more like a Linux kernel bug than
> anything else.
>
> 			

*nod*

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com