On 5/19/09, Scott Carey <scott@xxxxxxxxxxxxxxxxx> wrote: > > On 5/19/09 3:46 AM, "Dimitri" <dimitrik.fr@xxxxxxxxx> wrote: > >> On 5/19/09, Scott Carey <scott@xxxxxxxxxxxxxxxxx> wrote: >>> >>> On 5/18/09 3:32 PM, "Dimitri" <dimitrik.fr@xxxxxxxxx> wrote: >>> >>>> On 5/18/09, Scott Carey <scott@xxxxxxxxxxxxxxxxx> wrote: >>>>> Great data Dimitri!' >>>> >>>> Thank you! :-) >>>> >>>>> >>>>> I see a few key trends in the poor scalability: >>>>> >>>>> The throughput scales roughly with %CPU fairly well. But CPU used >>>>> doesn't >>>>> go past ~50% on the 32 core tests. This indicates lock contention. >>>>> >>>> >>>> You should not look on #1 STATs, but on #2 - they are all with the >>>> latest "fixes" - on all of them CPU is used well (90% in pic on >>>> 32cores). >>>> Also, keep in mind these cores are having 2 threads, and from Solaris >>>> point of view they are seen as CPU (so 64 CPU) and %busy is accounted >>>> as for 64 CPU >>>> >>> >>> Well, if the CPU usage is actually higher, then it might not be lock >>> waiting >>> -- it could be spin locks or context switches or cache coherency >>> overhead. >>> Postgres may also not be very SMT friendly, at least on the hardware >>> tested >>> here. >> >> do you mean SMP or CMT? ;-) >> however both should work well with PostgreSQL. I also think about CPU >> affinity - probably it may help to avoid CPU cache misses - but makes >> sense mostly if pooler will be added as a part of PG. > > Symmetric Multi Threading (HyperThreading in Intels marketing terms, other > marketing terms for Sun or IBM). One CPU core that can handle more than one > concurrently executing thread. > Technically, 'SMT' allows instructions in flight from multiple threads at > once in a superscalar Cpu core while some implementations differ and might > technically CMT (one thread or the other, but can switch fast, or a > non-superscalar core). > > For many implementations of 'multiple threads on one CPU core' many of the > processor resources are reduced per thread when it is active -- caches get > split, instruction re-order buffers are split, etc. That is rather hardware > implementation dependant. > > For Intel's SMT (and other similar), spin-locks hurt scalability if they > aren't using new special instructions for the spin to yield pipeline slots > to the other thread. > > Generally, code that stresses common processor resources more than CPU > execution will scale poorly with SMT/CMT etc. All application are scaling well anyway, except if you have any kind of lock contention inside of the application itself or meet any kind of system resource become hot. But well, here we may spend days to discuss :-) > > So I'm not sure about the Postgres details, but the general case of an > application that doesn't benefit from these technologies exists, and there > is a non-zero chance that Postgres has some characteristics of such an app. > >>> >>> (what was the context switch rate? I didn't see that in the data, just >>> mutex spins). >> >> increasing with a load, as this ex.: >> http://dimitrik.free.fr/Report_20090505/5539_dim_STAT_100.html#bmk_CPU_CtxSwit >> ch_100 >> > > Well, on most systems over 100K context switches/sec is a lot. And those > reach 180000 /sec. > However, this is 'only' 10 context switches per transaction and less than > 20% system CPU, so maybe those numbers aren't quite as big as they seem. > > Out of curiosity, what was the context switch rate for MySql at its peak > throughput? the main MySQL problem is a mutex locking like here: http://dimitrik.free.fr/Report_20090504/5465_dim_STAT_31.html#bmk_SpinMtx_31 so you have to limit a number of active threads to lower this contention (similar to pooler idea folks told here) and the context switch is even higher (~200K/sec) >> >>> >>> The scalability curve is definitely showing something. Prepared >>> statements >>> were tried, as were most of the other suggestions other than one: >>> >>> What happens if the queries are more complicated (say, they take 15ms >>> server >>> side with a more complicated plan required)? That is a harder question >>> to >>> answer >> >> What I observed is: if planner takes more long time (like initially >> with 8.3.7 and analyze target 1000) the scalability problem is >> appearing more strange - >> http://dimitrik.free.fr/Report_20090505/5521_dim_STAT_18.html - as you >> see CPU even not used more than 60% , and as you may see spin locks >> are lowering - CPUs are not spinning for locks, there is something >> else.. >> I'm supposing a problem of some kind of synchronization - background >> processes are not waking up on time or something like this... >> Then, if more time spent on the query execution itself and not planner: >> - if it'll be I/O time - I/O will hide everything else until you >> increase a storage performance and/or add more RAM, but then you come >> back to the initial issue :-) >> - if it'll be a CPU time it may be interesting! :-) >> >> Rgds, >> -Dimitri >> > > Ok, so that's good info that the planner or parser side seems to scale less > effectively than the execution (as the results show), but I'm wondering > about queries with longer execution times not longer planner times. I'm > wondering that, because its my opinion that most applications that will use > larger scale hardware will have more complicated queries than your test. > Its also greedy on my part since most queries in my applications are > significantly more complicated. > Regardless of my opinions -- this test is on one extreme (small fast > queries) of the spectrum. Its useful to know some data points on other > parts of the spectrum. As I've mentioned before, such fast queries are very common for some applications (like banking transactions, stock management, internet forums, etc.) - 5-7years ago the goal with this test was to keep response time under 1sec (I'm not kidding :-) but nowdays we're running under a millisecond.. Crazy progress, no? :-)) However, I've started to extend db_STRESS kit to accept any kind of query against any kind of db schema. So if you have an interesting data model and some queries to run - I'll be happy to adapt them as a new scenario! :-)) Rgds, -Dimitri > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance