On 2022-08-21 Su 17:15, Tom Lane wrote: > Andrew Dunstan <andrew@xxxxxxxxxxxx> writes: >> On 2022-08-20 Sa 23:20, Tom Lane wrote: >>> Kevin McKibbin <kevinmckibbin123@xxxxxxxxx> writes: >>>> What's limiting my DB from allowing more connections? >> The first question in my mind from the above is where this postgres >> instance is actually listening. Is it really /var/run/postgresql? Its >> postmaster.pid will tell you. I have often seen client programs pick up >> a system libpq which is compiled with a different default socket directory. > I wouldn't think that'd explain a symptom of some connections succeeding > and others not within the same pgbench run. Oh, yes, I agree, I missed that aspect of it. > > I tried to duplicate this behavior locally (on RHEL8) and got something > interesting. After increasing the server's max_connections to 1000, > I can do > > $ pgbench -S -c 200 -j 100 -t 100 bench > > and it goes through fine. But: > > $ pgbench -S -c 200 -j 200 -t 100 bench > pgbench (16devel) > starting vacuum...end. > pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5440" failed: Resource temporarily unavailable > Is the server running locally and accepting connections on that socket? > pgbench: error: could not create connection for client 154 > > So whatever is triggering this has nothing to do with the server, > but with how many threads are created inside pgbench. I notice > also that sometimes it works, making it seem like possibly a race > condition. Either that or there's some limitation on how fast > threads within a process can open sockets. > > Also, I determined that libpq's connect() call is failing synchronously > (we get EAGAIN directly from the connect() call, not later). I wondered > if libpq should accept EAGAIN as a synonym for EINPROGRESS, but no: > that just makes it fail on the next touch of the socket. > > The only documented reason for connect(2) to fail with EAGAIN is > > EAGAIN Insufficient entries in the routing cache. > > which seems pretty unlikely to be the issue here, since all these > connections are being made to the same local address. > > On the whole this is smelling more like a Linux kernel bug than > anything else. > > *nod* cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com