Re: Scalability in postgres

Scott Carey <scott@xxxxxxxxxxxxxxxxx> · Wed, 3 Jun 2009 14:09:45 -0700

On 6/3/09 11:39 AM, "Robert Haas" <robertmhaas@xxxxxxxxx> wrote:

> On Wed, Jun 3, 2009 at 2:12 PM, Scott Carey <scott@xxxxxxxxxxxxxxxxx> wrote:
>> Postgres could fix its connection scalability issues -- that is entirely
>> independent of connection pooling.
> 
> Really?  I'm surprised.  I thought the two were very closely related.
> Could you expand on your thinking here?
> 

They are closely related only by coincidence of Postgres' flaws.
If Postgres did not scale so poorly as idle connections increase (or as
active ones increased), they would be rarely needed at all.

Most connection pools in clients (JDBC, ODBC, for example) are designed to
limit the connection create/close count, not the number of idle connections.
They reduce creation/deletion specifically by leaving connections idle for a
while to allow re-use. . .

Other things that can be called "connection concentrators" differ in that
they are additionally trying to put a band-aid over server design flaws that
make idle connections hurt scalability.  Or to prevent resource consumption
issues that the database doesn't have enough control over on its own (again,
a flaw -- a server should be as resilient to bad client behavior and its
resource consumption as possible).

Most 'modern' server designs throttle active actions internally.  Apache's
(very old, and truly somewhat 1995-ish) process or thread per connection
model is being abandoned for event driven models in the next version, so it
can scale like the higher performing web servers to 20K+ keep-alive
connections with significantly fewer threads / processes.

SQL is significantly more complicated than HTTP and requires a lot more
state which dictates a very different design, but nothing about it requires
idle connections to cause reduced SMP scalability.

In addition to making sure idle connections have almost no impact on
performance (just eat up some RAM), scalability as active queries increase
is important.  Although the OS is responsible for a lot of this, there are
many things that the application can do to help out.  If Postgres had a
"max_active_connections" parameter for example, then the memory used by
work_mem would be related to this value and not max_connections.  This would
further make connection poolers/concentrators less useful from a performance
and resource management perspective.

Once the above is done, connection pooling, whether integrated or provided
by a third party, would mostly only have value for clients who cannot pool
or cache connections on their own.  This is the state of connection pooling
with most other DB's today.

> ...Robert
> 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance