Re: pgbench could not send data to client: Broken pipe

Greg Smith <greg@xxxxxxxxxxxxxxx> · Thu, 09 Sep 2010 12:05:25 -0400

Kevin Grittner wrote:
Of course, the only way to really know some of these numbers is to
test your actual application on the real hardware under realistic
load; but sometimes you can get a reasonable approximation from
early tests or "gut feel" based on experience with similar
applications.

And that latter part only works if your gut is as accurate as Kevin's.  
For most people, even a rough direct measurement is much more useful 
than any estimate.

Anyway, Kevin's point--that ultimately you cannot really be executing 
more things at once than you have CPUs--is an accurate one to remember 
here.  One reason to put connection pooling in front of your database is 
that it cannot handle thousands of active connections at once without 
switching between them very frequently.  That wastes both CPU and other 
resources with contention that could be avoided.

If you expect, say, 1000 simultaneous users, and you have 48 CPUs, there 
is only 48ms worth of CPU time available to each user per second on 
average.  If you drop that to 100 users using a pooler, they'll each get 
480ms worth of it.  But no matter what, when the CPUs are busy enough to 
always have a queued backlog, they will clear at best 48 * 1 second = 
48000 ms of work from that queue each second, best case, no matter how 
you setup the ratios here.

Now, imagine that the average query takes 24ms.  The two scenarios work 
out like this:

Without pooler:  takes 24 / 48 = 0.5 seconds to execute in parallel with 
999 other processes

With pooler:  Worst-case, the pooler queue is filled and there are 900 
users ahead of this one, representing 21600 ms worth of work to clear 
before this request will become active.  The query waits 21600 / 48000 = 
0.45 seconds to get runtime on the CPU.  Once it starts, though, it's 
only contending with 99 other processes, so it gets 1/100 of the 
available resources.  480 ms of CPU time executes per second for this 
query; it runs in 0.05 seconds at that rate.  Total runtime:  0.45 + 
0.05 = 0.5 seconds!

So the incoming query in this not completely contrived case (I just 
picked the numbers to make the math even) takes the same amount of time 
to deliver a result either way.  It's just a matter of whether it spends 
that time waiting for a clear slice of CPU time, or fighting with a lot 
of other processes the whole way.  Once the incoming connections exceeds 
CPUs by enough of a margin that a pooler can expect to keep all the CPUs 
busy, it delivers results at the same speed as using a larger number of 
connections.  And since the "without pooler" case assumes perfect 
slicing of time into units, it's the unrealistic one; contention among 
the 1000 processes will actually make it slower than the pooled version 
in the real world.  You won't see anywhere close to 48000 ms worth of 
work delivered per second anymore if the server is constantly losing its 
CPU cache, swapping among an average of an average of 21 
connections/CPU.  Whereas if it's only slightly more than 2 connections 
per CPU, each of them should alternate between the two processes easily 
enough.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@xxxxxxxxxxxxxxx   www.2ndQuadrant.us

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance