Re: Proposal of tunable fix for scalability of 8.4

Scott Carey <scott@xxxxxxxxxxxxxxxxx> · Thu, 12 Mar 2009 14:45:54 -0700

Title: Re:  Proposal of tunable fix for scalability of 8.4 

On 3/12/09 11:28 AM, "Tom Lane" <tgl@xxxxxxxxxxxxx> wrote:

Scott Carey <scott@xxxxxxxxxxxxxxxxx> writes:

> They are not meaningless.  It is certainly more to understand, but the test is entirely valid without that.  In a CPU bound / RAM bound case, as concurrency increases you look for the throughput trend, the %CPU use trend and the context switch rate trend.  More information would be useful but the test is validated by the evidence that it is held up by lock contention.

Er ... *what* evidence?  There might be evidence somewhere that proves

that, but Jignesh hasn't shown it.  The available data suggests that the

first-order performance limiter in this test is something else.

Otherwise it should be possible to max out the performance with a lot

less than 1000 active backends.

                        regards, tom lane

Evidence:

Ramp up the concurrency, measure throughput.  Throughput peaks at X with low CPU utilization, linear ramp up until then.   Change lock code.  Throughput scales past that point to much higher CPU load.

That’s evidence.  Please explain a scenario that proves otherwise.  Your last statement above is true but not applicable here.  The test is not 1000 backends, it lists 1000 users.

There is a key difference between users and backends.  In fact, the evidence is that the result can’t be backends (the column is labeled users).  If its not I/O bound it must cap out at roughly the number of active backends near the number of CPU or less,  and as noted it does not.  This isn’t proof that there is something wrong with the test, its proof that the 1000 number cannot be active backends.

I spent a decade solving and tuning CPU scalability problems in CPU/memory bound systems.  Sophisticated tests peak at a user count >> CPU count, because real users don’t execute as fast as possible.  Through a chain of servers several layers deep, each tier can have different levels of concurrent activity.  Its useful to measure concurrency at each tier, but almost impossible in postgres (easy in oracle / mssql).  Most systems have a limited thread pool but can queue much more than that number.  Postgres and many databases don’t do that so clients must via connection pools.  But the result behavior of too much concurrency is thrashing and inefficiency — this shows up in a test that ramps up concurrency by peak throughput followed by a steep drop off in throughput as concurrency goes into the thrashing state.  At this thrashing time a lot of context switching and sometimes RAM pressure is a typical symptom.

The only way to construct a test that shows the current described behavior (linear ramp up, then plateau) is to  have lock contention, I/O bottlenecks, or CPU saturation.  The number of users is irrelevant, the trend is the same regardless of the relationship between user count and active backend count (0 delay or 1 second delay, same result different X axis).  If it was an I/O or client bottleneck, changing the lock code wouldn’t have made it faster.  

The evidence is 100% certain that the first test result is limited by locks, and that changing them increased throughput.