On 5/18/09, Scott Carey <scott@xxxxxxxxxxxxxxxxx> wrote: > Great data Dimitri!' Thank you! :-) > > I see a few key trends in the poor scalability: > > The throughput scales roughly with %CPU fairly well. But CPU used doesn't > go past ~50% on the 32 core tests. This indicates lock contention. > You should not look on #1 STATs, but on #2 - they are all with the latest "fixes" - on all of them CPU is used well (90% in pic on 32cores). Also, keep in mind these cores are having 2 threads, and from Solaris point of view they are seen as CPU (so 64 CPU) and %busy is accounted as for 64 CPU > Other proof of lock contention are the mutex locks / sec graph which climbs exactly, except no locking was seen on processes while I tried to trace them.. What I think will be needed here is a global and corelated tracing of all PG processes - I did not expect to do it now, but next time > rapidly as the system gets more inefficient (along with context switches). > > Another trend is the system calls/sec which caps out with the test, at about > 400,000 per sec on the peak (non-prepared statement) result. Note that when > the buffer size is 256MB, the performance scales much worse and is slower. > And correlated with this the system calls/sec per transaction is more than > double, at slower throughput. of course, because even the data were cached by filesystem to get them you still need to call a read() system call.. > > Using the OS to cache pages is not as fast as pages in shared_buffers, by a > more significant amount with many cores and higher concurrency than in the > low concurrency case. exactly, it's what I also wanted to demonstrate because I often hear "PG is delegating caching to the filesystem" - and I don't think it's optimal :-) > > The system is largely lock limited in the poor scaling results. This holds > true with or without the use of prepared statements -- which help a some, > but not a lot and don't affect the scalability. we are agree here, but again - 20K mutex spins/sec is a quite low value, that's why I hope on the bigger server it'll be more clear where is a bottleneck :-) Rgds, -Dimitri > > > 4096MB shared buffers, 32 cores, 8.4, read only: > http://dimitrik.free.fr/Report_20090505/5539_dim_STAT_70.html > > 256MB cache, 32 cores, 8.4, read-only: > http://dimitrik.free.fr/Report_20090505/5539_dim_STAT_52.html > > 4096MB shared buffs, 32 cores, 8.4, read only, prepared statements > http://dimitrik.free.fr/Report_20090505/5539_dim_STAT_70.html > > On 5/18/09 11:00 AM, "Dimitri" <dimitrik.fr@xxxxxxxxx> wrote: > >> Folks, I've just published a full report including all results here: >> http://dimitrik.free.fr/db_STRESS_PostgreSQL_837_and_84_May2009.html >> >> From my point of view it needs first to understand where the time is >> wasted on a single query (even when the statement is prepared it runs >> still slower comparing to MySQL). >> >> Then to investigate on scalability issue I think a bigger server will >> be needed here (I'm looking for 64cores at least :-)) >> >> If you have some other ideas or patches (like Simon) - don't hesitate >> to send them - once I'll get an access to the server again the >> available test time will be very limited.. >> >> Best regards! >> -Dimitri >> >> >> On 5/18/09, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote: >>> >>> On Thu, 2009-05-14 at 20:25 +0200, Dimitri wrote: >>> >>>> # lwlock_wait_8.4.d `pgrep -n postgres` >>> >>>> Lock Id Mode Combined Time (ns) >>>> FirstLockMgrLock Exclusive 803700 >>>> BufFreelistLock Exclusive 3001600 >>>> FirstLockMgrLock Shared 4586600 >>>> FirstBufMappingLock Exclusive 6283900 >>>> FirstBufMappingLock Shared 21792900 >>> >>> I've published two patches to -Hackers to see if we can improve the read >>> only numbers on 32+ cores. >>> >>> Try shared_buffer_partitions = 256 >>> >>> -- >>> Simon Riggs www.2ndQuadrant.com >>> PostgreSQL Training, Services and Support >>> >>> >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance >> > > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance