Re: Connection pooling - Number of connections

Claudio Freire <klaussfreire@xxxxxxxxx> · Fri, 21 Mar 2014 16:04:06 -0300

On Fri, Mar 21, 2014 at 3:36 PM, Guillaume Smet
<guillaume.smet@xxxxxxxxx> wrote:
> On Fri, Mar 21, 2014 at 5:17 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>> It will cost you, in ProcArray scans for example.  But lots-of-idle-
>> connections is exactly what a pooler is supposed to prevent.  If you have
>> a server that can handle say 10 active queries, you should have a pool
>> size of 10, not 100.  (If you have a server that can actually handle
>> 100 active queries, I'd like to have your IT budget.)
>>
>> The proposed design sounds fairly reasonable to me, as long as users are
>> clear on how to set the pool size --- and in particular that bigger is
>> not better.  Clueless users could definitely shoot themselves in the
>> foot, though.
>
> Yeah, well.
>
> My understanding of what happened on the field is that people usually
> set the pool size limit quite high because they don't want to
> experience connection starvation even if there is a temporary slowdown
> of their application/database.

My experience is that small transaction-mode connection pools used to
serve very quick queries can sometimes not fully use the hardware if
the connections aren't set in autocommit mode on the client side,
because the network roundtrips hold onto the server slot for a sizable
portion of the lifecycle.

So, my recommendation: use protocol-level autocommit for read-only
queries and cores+spindles workers - that will use your hardware
fully.

On Fri, Mar 21, 2014 at 3:41 PM, David Johnston <polobo@xxxxxxxxx> wrote:
>> Reaching the maxPoolSize from the minPoolSize means creating the
>> connections at the crucial moment where the client application is in the
>> desperate need of completing an important query/transaction which the
>> primary responsibility since it cannot hold the data collected.
>
> One query is slowed down a little in the unusual situation where not enough
> pooled connections are available.  To fix that you want to slow down the
> entire server all of the time?  Really?  And even if this is sometimes the
> best option your assertion is unqualified so do you really think this is
> best for everyone, always?

I don't think a variable connection pool makes any sense if you cast
deadlocks aside for a moment.

The only reason for connection starvation for a properly sized pool,
is hardware overload. When you have hardware overload, you really
don't want to throw more load at it, you want to let it cool down.

The solution, if you have many heavy, OLAP-style queries that block
the rest, is to have two pools, and size both so that you don't
overload (or at least you overload controlledly) the server. Send the
OLAP queries to one pool, and the OLTP queries to the other, and you
guarantee a smooth flow, even if you cannot guarantee 100% hardware
utilization and maximum thoughput both.

Now, if we consider deadlocks, you might have connection starvation
because all of the connections are waiting for an operation that is
deadlocked in application code (the deadlock cannot occur at the DB
level if you use transaction mode, but you can have open transactions
waiting on a mutex that is waiting for a connection). I've experienced
application-level deadlocks like these, and the solution for me has
always been to have a reasonable timeout and connections for a reserve
pool - when connections are waiting for more than X seconds (that one
considers abnormal given the knowledge you have about load
characteristics), grow the pool to free up resources and try to
dislodge the deadlocked application threads.

Sometimes, regular overload triggers the reserve pool, so you cannot
be too generous on the number of connections you'll have in reserve.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance