Kirk Strauser wrote:
...
I understand why pooling within a process itself is a good thing.
However, say I have two users running the same program on different
desktop machines. At present, those applications connect with the
same username/password that's tied to the program and not the actual
user. It seems like if Abby and Barb end up sharing the same
connection from the pool, and Abby runs some giant report query, then
Barb would get held back while she waits for it to finish. Is that
true? Even if not, what would be the advantage in the two of them
sharing a connection?
Short answer: performance.
First you need the correct mental model. The "pool" in pooling is a
pre-established pool of connections to the database. When a client
connects to the pooler, the pooler decides, based on its configuration,
which database connection will receive the statement sent to the pool by
the client. The pooler can also monitor the connection pool and decide
when to increase the pool size or release database connections.
Pgbouncer (part of Skype tools) has worked very well for me. It can be
configured to handle connections in a variety of ways. At one extreme, a
connection to the database is only used for the duration of a single
statement then that database connection is available and waiting for the
next statement. At the other extreme, a connection is assigned and tied
up for the entire duration that the client is connected to the pooler.
In any case, Barb and Abby will not be sharing a database connection
*simultaneously* - if they are using their desktop app which is
connected to the db through the pooler and they are both running queries
at the same time then each of them will have their own connection to the
database.
Whether or not you will benefit from pooling depends on the nature of
your application. If, for example, the application runs queries by
connecting, running query, disconnecting and those queries are
long-running and/or infrequent then pooling will probably be of little
if any benefit.
At the other extreme, say you have a very high volume of short
single-statement transactions (think web). You start seeing a
significant impact of the connection setup/teardown time. Even if the
app connects for each web-hit, connecting to a pooler like pgbouncer is
much faster than connecting directly to the database (I've done some
tests showing nearly a factor of 10 improvement in page delivery rates).
But if the pooler is set to statement-level pooling, you can have
hundreds of web processes each maintaining a persistent connection to
the pooler while the pooler only needs to maintain sufficient
connections for simultaneous statements - generally a tiny fraction of
the number of web processes.
The pooler typically has a number of other configuration parameters that
govern things like the maximum amount of time a server connection will
be used before it is dropped and reconnected as well as how long idle
database connections hang around before they are dropped.
Beware. You can not just drop a pooler in place without understanding
the applications that will connect through it. Setting GUC variables,for
example, is a red flag. You could easily create a situation where you
increase work-memory for a specific statement but that statement ends up
assigned to a different database backend or have a client connection be
unaware of GUC changes made by a previously connected client that was
assigned to that backend.
But if your application is designed to work well with pooling, it can
provide dramatic performance benefits.
Cheers,
Steve
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general