Re: Why would I want to use connection pooling middleware?

"Christopher Browne" <cbbrowne@xxxxxxxxx> · Thu, 15 Jan 2009 11:17:21 -0500

On Thu, Jan 15, 2009 at 10:54 AM, Kirk Strauser <kirk@xxxxxxxxxxxx> wrote:
> I have a PostgreSQL 8.3.5 server with max_connections = 400.  At this
> moment, I have 223 open connections, including 64 from a bunch of webserver
> processes and about 100 from desktop machines running a particular
> application.  The rest are from various scheduled processes and other
> assorted things.  Now, I know there are projects like pgpool-II that can
> serve to pool connections to the server.  Why would I want to do that,
> though?
>
> I understand why pooling within a process itself is a good thing.  However,
> say I have two users running the same program on different desktop machines.
>  At present, those applications connect with the same username/password
> that's tied to the program and not the actual user.  It seems like if Abby
> and Barb end up sharing the same connection from the pool, and Abby runs
> some giant report query, then Barb would get held back while she waits for
> it to finish.  Is that true?  Even if not, what would be the advantage in
> the two of them sharing a connection?

There tend to be three effects that provide benefits:

1.  Fewer connections tend to consume less resources on the DBMS server.

Each connection consumes some resources, memory, generates lock
entries, and such, and having fewer connections means that the
aggregate size of the postmaster processes is likely to be smaller.

2.  Pooling connections should mean that you can use and re-use
connections, which should reduce the amount of work done building up
and tearing down connections.

Each PostgreSQL connection is handled by a separate OS process; if the
connection pool is passing the same connection from user to user, your
system is doing less work spawning backend processes, doing
authentication, and otherwise getting from fork() to "ready to handle
queries."

3.  There is only so much *genuine* concurrency that you can actually
get out of your DB server, and there is only limited value to having
more backend processes than this "emergent quantity."

For instance, if you only have a single CPU and a single disk drive,
then your computer is only ever *truly* doing one thing at a time.
Trying to make such a server service 200 connections, each trying to
do work, means that this server will be doing a great deal of work
switching from process to process, doing the context switches.

That's an extreme point, of course, but it should be reasonably
intuitive to consider that...
- If you have 10 CPUs and a RAID array of 10 disk drives, then that
host can likely cope comfortably with doing ~10 things at once;
- Change those numbers to 20/20 and the intuition continues.

If some of your 200 connections are only intermittently used, then if
you had a connection pool with 20 "real" connections, then the 200
users would seldom notice delays due to sharing.  And the connection
pool usage would mean that the DB server would have way fewer
processes kicking around consuming memory.

You might well be better using the process for the extra 180 backends
for shared cache :-).
-- 
http://linuxfinances.info/info/linuxdistributions.html
Joe E. Lewis  - "There's only one thing money won't buy, and that is poverty."

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general