Re: Any better plan for this query?..

Dimitri <dimitrik.fr@xxxxxxxxx> · Tue, 19 May 2009 00:32:54 +0200

On 5/18/09, Scott Carey <scott@xxxxxxxxxxxxxxxxx> wrote:
> Great data Dimitri!'

Thank you! :-)

>
> I see a few key trends in the poor scalability:
>
> The throughput scales roughly with %CPU fairly well.  But CPU used doesn't
> go past ~50% on the 32 core tests.  This indicates lock contention.
>

You should not look on #1 STATs, but on #2 - they are all with the
latest "fixes"  - on all of them CPU is used well (90% in pic on
32cores).
Also, keep in mind these cores are having 2 threads, and from Solaris
point of view they are seen as CPU (so 64 CPU) and %busy is accounted
as for 64 CPU

> Other proof of lock contention are the mutex locks / sec graph which climbs

exactly, except no locking was seen on processes while I tried to
trace them.. What I think will be needed here is a global and
corelated tracing of all PG processes - I did not expect to do it now,
but next time

> rapidly as the system gets more inefficient (along with context switches).
>
> Another trend is the system calls/sec which caps out with the test, at about
> 400,000 per sec on the peak (non-prepared statement) result.  Note that when
> the buffer size is 256MB, the performance scales much worse and is slower.
> And correlated with this the system calls/sec per transaction is more than
> double, at slower throughput.

of course, because even the data were cached by filesystem to get them
you still need to call a read() system call..

>
> Using the OS to cache pages is not as fast as pages in shared_buffers, by a
> more significant amount with many cores and higher concurrency than in the
> low concurrency case.

exactly, it's what I also wanted to demonstrate because I often hear
"PG is delegating caching to the filesystem" - and I don't think it's
optimal :-)

>
> The system is largely lock limited in the poor scaling results.  This holds
> true with or without the use of prepared statements -- which help a some,
> but not a lot and don't affect the scalability.

we are agree here, but again - 20K mutex spins/sec is a quite low
value, that's why I hope on the bigger server it'll be more clear
where is a bottleneck :-)

Rgds,
-Dimitri

>
>
> 4096MB shared buffers, 32 cores, 8.4, read only:
> http://dimitrik.free.fr/Report_20090505/5539_dim_STAT_70.html
>
> 256MB cache, 32 cores, 8.4, read-only:
> http://dimitrik.free.fr/Report_20090505/5539_dim_STAT_52.html
>
> 4096MB shared buffs, 32 cores, 8.4, read only, prepared statements
> http://dimitrik.free.fr/Report_20090505/5539_dim_STAT_70.html
>
> On 5/18/09 11:00 AM, "Dimitri" <dimitrik.fr@xxxxxxxxx> wrote:
>
>> Folks, I've just published a full report including all results here:
>> http://dimitrik.free.fr/db_STRESS_PostgreSQL_837_and_84_May2009.html
>>
>> From my point of view it needs first to understand where the time is
>> wasted on a single query (even when the statement is prepared it runs
>> still slower comparing to MySQL).
>>
>> Then to investigate on scalability issue I think a bigger server will
>> be needed here (I'm looking for 64cores at least :-))
>>
>> If  you have some other ideas or patches (like Simon) - don't hesitate
>> to send them - once I'll get an access to the server again the
>> available test time will be very limited..
>>
>> Best regards!
>> -Dimitri
>>
>>
>> On 5/18/09, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote:
>>>
>>> On Thu, 2009-05-14 at 20:25 +0200, Dimitri wrote:
>>>
>>>> # lwlock_wait_8.4.d `pgrep -n postgres`
>>>
>>>>                Lock Id            Mode   Combined Time (ns)
>>>>       FirstLockMgrLock       Exclusive                 803700
>>>>        BufFreelistLock       Exclusive                 3001600
>>>>       FirstLockMgrLock          Shared               4586600
>>>>  FirstBufMappingLock       Exclusive              6283900
>>>>  FirstBufMappingLock          Shared             21792900
>>>
>>> I've published two patches to -Hackers to see if we can improve the read
>>> only numbers on 32+ cores.
>>>
>>> Try shared_buffer_partitions = 256
>>>
>>> --
>>>  Simon Riggs           www.2ndQuadrant.com
>>>  PostgreSQL Training, Services and Support
>>>
>>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance