In case anybody looks back on this thread in the future, I fixed the problem (at least temporarily) by running the following in CentOS 7:
# echo "1025 65535" >/proc/sys/net/ipv4/ip_local_port_range
# sysctl -w net.ipv4.tcp_timestamps=1
# sysctl -w net.ipv4.tcp_tw_recycle=1
# sysctl -w tcp_tw_reuse=1
On both the database server and the client nodes. It seems the clients were eating up all the available ephemeral ports opening and closing database connections. I'm going to have a long talk with the developer about why this is done in the first place :)
Cheers, and thanks for the tips.
--Steve
On Tue, Mar 29, 2016 at 6:51 PM Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
On 03/29/2016 03:25 PM, Stephen Constable wrote:
> Sorry, my client environment is Linux.
Hmm, so I was reading win32.c wrong. It is mapping a Windows error
message to that string.
>
> My current theory is that my clients are running out of available
> ephemeral ports, like in this thread:
> http://dba.stackexchange.com/questions/59650/pgbouncer-works-great-but-occasionally-becomes-unavailable (but
> I"m not currently using pg bouncer). I tried pg bouncer before and had
> the same errors, which in retrospect makes the client-side issue seem
> more likely. Are there any configuration variables I can set to reduce
> the number of ephemeral ports required in the postgresql client
> libraries? Otherwise, I will attempt to reconfigure the OS of the
> client machines tomorrow morning.
Not sure how that would work. To make a network connection would seem to
me to require a port.
Are you seeing the same sort of port churn on your 8.4 machine?
Is the fact that is processing results slower maybe giving the ports a
chance to timeout their wait time, versus not on the newer faster machine?
The issue, to me at least, seems to be less the number of jobs, but the
number of connections each job is producing. T
>
> Thanks,
> Steve
>
> On Tue, Mar 29, 2016 at 4:44 PM Adrian Klaver <adrian.klaver@xxxxxxxxxxx
> <mailto:adrian.klaver@xxxxxxxxxxx>> wrote:
>
> On 03/29/2016 01:28 PM, Stephen Constable wrote:
> > My apologies, I'm not sure what part of the networking stack the
> > messages are coming from. It also states:
> > """
> > could not connect to server: Cannot assign requested address
> > Is the server running on host "<hostname>" and accepting
> > TCP/IP connections on port <port>?
> > """
>
> Alright I lied, the above is a Postgres error message. I am just not
> used to seeing 'Cannot assign requested address'. Turns out it is in
> interfaces/libpq/win32.c.
>
> So your client is running on Windows?
>
>
> > This error is only printed under a 32-job load, never a single
> job load.
> >
> > The processes are indeed connecting over a local network.
> >
> > I have only enabled the logging of connections and disconnections
> since
> > I figured that would be the most telling :) perhaps that was not the
> > best idea. but, FYI, I see over 5000 such notices in a single
> minute.
> > I will reconfigure the logging to be more verbose.
> >
> > Thanks,
> > Steve
> >
> > On Tue, Mar 29, 2016 at 4:21 PM Adrian Klaver
> <adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>
> > <mailto:adrian.klaver@xxxxxxxxxxx
> <mailto:adrian.klaver@xxxxxxxxxxx>>> wrote:
> >
> > On 03/29/2016 01:10 PM, Stephen Constable wrote:
> > > Hi All,
> > >
> > > I'm a new-ish sysadmin working on porting legacy
> scientific code
> > from a
> > > local server/client to new supercomputer environment. My
> work is
> > mostly
> > > done, except that my postgres database doesn't seem to be
> able to
> > keep
> > > up with the new environment. The application is written
> in-house
> > in a
> > > mixture of FORTAN 77 and C, and uses postgres BLOBS as its
> main data
> > > store. This application in particular only reads from the
> > database, it
> > > never writes, which *should* make it easy to scale.
> > >
> > > My main problem is that this client application is unable to
> > connect to
> > > the database under a modest load (32 simultaneous jobs).
> The client
> > > error logs print out messages like "could not connect to
> server:
> > Cannot
> > > assign requested address" and "Cannot connect to database
> > [runlog]!!!"
> > > (an important database of ours). The "cannot assign requested
> > address"
> >
> > Well those do not look like Postgres error messages to me, so
> the first
> > thing would be to determine what part of the stack is
> generating them.
> >
> > Is the client software connecting to the database over a network?
> >
> > Are you using connection pooling?
> >
> > > message makes me think it's a configuration issue. The
> logs are
> > flooded
> > > with hundreds of connection and disconnection notices per
> > second. This
> >
> > Might want to turn off logging connections/disconnections:
> >
> >
> http://www.postgresql.org/docs/9.4/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT
> >
> > log_connections (boolean)
> >
> > log_disconnections (boolean)
> >
> > > same code and configuration runs fine on our mid-2000's
> Solaris
> > 10 box
> > > with postgres 8.4 (albeit very slowly) but totally fails
> with these
> > > connection errors on a modern Dell system running CentOS 7 or
> > FreeBSD 10
> > > (I tested both) with postgres 9.4.
> > >
> > > While the database is under load (and jobs are actively
> failing),
> > select
> > > count(*) from pg_stat_activity returns 30-34 ish
> connections, show
> > > max_connections returns 100, and show
> superuser_reserved_connections
> > > shows 3. My only other hint is that right after a fresh
> install of
> > > CentOS 7 my job success rate was around 50%, and now it has
> > approached
> > > approximately 5%, so something is changing over time.
> > >
> > > Does anyone have any advice or experience with similar issues?
> >
> > What else does the Postgres log show besides the
> > connections/disconnections, that might be of interest?
> >
> > What does the system log show?
> >
> > >
> > > Thanks,
> > > Steve
> > >
> >
> >
> > --
> > Adrian Klaver
> > adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>
> <mailto:adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>>
> >
>
>
> --
> Adrian Klaver
> adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>
>
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx