My apologies, I'm not sure what part of the networking stack the messages are coming from. It also states:
"""
could not connect to server: Cannot assign requested address
Is the server running on host "<hostname>" and accepting
TCP/IP connections on port <port>?
"""
This error is only printed under a 32-job load, never a single job load.
The processes are indeed connecting over a local network.
Thanks,
Steve
On Tue, Mar 29, 2016 at 4:21 PM Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
On 03/29/2016 01:10 PM, Stephen Constable wrote:
> Hi All,
>
> I'm a new-ish sysadmin working on porting legacy scientific code from a
> local server/client to new supercomputer environment. My work is mostly
> done, except that my postgres database doesn't seem to be able to keep
> up with the new environment. The application is written in-house in a
> mixture of FORTAN 77 and C, and uses postgres BLOBS as its main data
> store. This application in particular only reads from the database, it
> never writes, which *should* make it easy to scale.
>
> My main problem is that this client application is unable to connect to
> the database under a modest load (32 simultaneous jobs). The client
> error logs print out messages like "could not connect to server: Cannot
> assign requested address" and "Cannot connect to database [runlog]!!!"
> (an important database of ours). The "cannot assign requested address"
Well those do not look like Postgres error messages to me, so the first
thing would be to determine what part of the stack is generating them.
Is the client software connecting to the database over a network?
Are you using connection pooling?
> message makes me think it's a configuration issue. The logs are flooded
> with hundreds of connection and disconnection notices per second. This
Might want to turn off logging connections/disconnections:
http://www.postgresql.org/docs/9.4/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT
log_connections (boolean)
log_disconnections (boolean)
> same code and configuration runs fine on our mid-2000's Solaris 10 box
> with postgres 8.4 (albeit very slowly) but totally fails with these
> connection errors on a modern Dell system running CentOS 7 or FreeBSD 10
> (I tested both) with postgres 9.4.
>
> While the database is under load (and jobs are actively failing), select
> count(*) from pg_stat_activity returns 30-34 ish connections, show
> max_connections returns 100, and show superuser_reserved_connections
> shows 3. My only other hint is that right after a fresh install of
> CentOS 7 my job success rate was around 50%, and now it has approached
> approximately 5%, so something is changing over time.
>
> Does anyone have any advice or experience with similar issues?
What else does the Postgres log show besides the
connections/disconnections, that might be of interest?
What does the system log show?
>
> Thanks,
> Steve
>
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx