Re: Proposal of tunable fix for scalability of 8.4

Scott Carey <scott@xxxxxxxxxxxxxxxxx> · Fri, 13 Mar 2009 10:48:38 -0700

Title: Re:  Proposal of tunable fix for scalability of 8.4

On 3/13/09 10:29 AM, "Scott Carey" <scott@xxxxxxxxxxxxxxxxx> wrote:

-----------------

Now, with 0ms delay, no threading change:

Throughput is 136000/min @184 users, response time 13ms.  Response time has not jumped too drastically yet, but linear performance increases stopped at about 130 users or so. ProcArrayLock busy, very busy.  CPU: 35% user, 11% system, 54% idle

With 0ms delay, and lock modification 2 (wake some, but not all)

Throughput is 161000/min @328 users, response time 28ms.  At 184 users as before the change, throughput is 147000/min with response time 0.12ms.  Performance scales linearly to 144 users, then slows down and slightly increases after that with more concurrency.

Throughput increase is between 15% and 25%. 

Forgot some data:  with the second test above, CPU: 48% user, 18% sys, 35% idle.   CPU increased from 46% used in the first test to 65% used, the corresponding throughput increase was not as large, but that is expected on an 8-threads per core server since memory bandwidth and cache resources at a minimum are shared and only trivial tasks can scale 100%.

Based on the above, I would guess that attaining closer to 100% utilization (its hard to get past 90% with that many cores no matter what), will probablyl give another 10 to 15% improvement at most, to maybe 180000/min throughput.

Its also rather interesting that the 2000 connection case with wait times gets 170000/min throughput and beats the 328 users with 0 delay result above.  I suspect the ‘wake all’ version is just faster.  I would love to see a ‘wake all shared, leave exclusives at front of queue’ version, since that would not allow lock starvation.