Re: How to handle missing pgsql.connect_timeout

Marco Colombo <pgsql@xxxxxxxxxx> · Tue, 8 Feb 2005 12:25:34 +0100 (CET)

On Wed, 2 Feb 2005, Christopher Murtagh wrote:

On Wed, 2005-02-02 at 09:49 +0100, Janning Vygen wrote:
It's not broken anymore! And it speeds up your website.

Well it might not be broken anymore, but I haven't seen any evidence of
a speed increase (quite the opposite in fact). What we did instead was
write a very small abstraction layer so that any page request would only
use 1 database connection. We're running on dual 2.8 GHz Xeons for our
web server and db machine and taking on over 500,000 hits per day. Some
of our pages are very data intensive (http://www.mcgill.ca/calendar/ for
example).

At the moment it waits "forever" (until max_execution_time i guess) and you
have no chance to show an error message like: DB is overloaded, please try
again.

maybe pg_pconnect() isn't all it is cracked up to be? Anyone else here
have good experiences with it?

You have to evaluate carefully your setup. I think there's little hope
to plan this in advance, you'll have to monitor your system at runtime.

I've been using persistent connections a lot. First, think of what kind
of problem you're trying to solve... decrease the connection overhead.
One connection made to the DB, per page. Now consider the following:

- PostgreSQL forks a backend process for each connection: this is
  almost universally considered a huge overhead. But on modern operating
  systems, this is no longer the case.

- network latency: if the PG server runs on a different host, it may be
  an issue. If it runs on the same host, and is accessed via Unix sockets,
  latency is negligible.

- in order to implement connect semantic, PHP has to 'reset' the session,
  which is a kind of overhead per se. Consider the same apache backend
  serving two different PG users! The connection may be the same, but
  all local and remote state has to be removed.

- if your pages use multiple db connections, you'll get many PG backends
  per apache backend.

- each PG backend uses up some resources (RAM mostly): there are a number
  of 'knobs' you can turn to control that, but of course, unless your RAM
  in infinite, you may have to _reduce_ them in order to increase the
  number of concurrent backends.

To put it very simply (maybe too simply) it's a CPU vs RAM tradeoff.
Using persistent connections turns a little (very OS and setup dependant)
CPU overhead into a not so little RAM overhead on the PG server side.

I wasn't able to find a definitive answer. I have many concurrent accesses
(say 200-300), with simple (read and write) queries, but on a large data
set. I have two apache frontends, and one PG server (same LAN). I wasn't
able to choose between pconnect or connect. At times, some PHP programmer
kills the DB, and I have to teach him not to perform 300 different queries
_per page_ with a PHP for loop and to learn how to use table joins instead.
I can tell you that when that happens, you'll forget about pconnect or
connect, and thank PG developers for writing such a robust application
(I've seen load averages up to 100, still PG was doing its job, even
if very slowly). :-)

.TM.
--
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@xxxxxx