On 10/11/2017 02:26 AM, pinker wrote: > Tomas Vondra-4 wrote >> I'm probably a bit dumb (after all, it's 1AM over here), but can you >> explain the CPU chart? I'd understand percentages (say, 75% CPU used) >> but what do the seconds / fractions mean? E.g. when the system time >> reaches 5 seconds, what does that mean? > > hehe, no you've just spotted a mistake, it suppose to be 50 cores :) > out of 80 in total > Ah, so it should say '50 cores' instead of '5s'? Well, that's busy system I guess. > > Tomas Vondra-4 wrote >> Have you tried profiling using perf? That usually identifies hot spots >> pretty quickly - either in PostgreSQL code or in the kernel. > > I was always afraid because of overhead, but maybe it's time to start ... > I don't follow. If you're not in trouble, a little bit of additional overhead is not an issue (but you generally don't need profiling at that moment). If you're already in trouble, then spending a bit of CPU time on basic CPU profile is certainly worth it. > > Tomas Vondra-4 wrote >> What I meant is that if the system evicts this amount of buffers all the >> time (i.e. there doesn't seem to be any sudden spike), then it's >> unlikely to be the cause (or related to it). > > I was actually been thinking about scenario where different sessions > want to at one time read/write from or to many different relfilenodes, > what could cause page swap between shared buffers and os cache? Perhaps. If the sessions only do reads, that would not be visible in buffer_backends I believe (not sure ATM, would have to check source). But it'd be visible in buffers_alloc and certainly in blks_read. > we see that context switches on cpu are increasing as well. kernel > documentation says that using page tables instead of Translation > Lookaside Buffer (TLB) is very costly and on some blogs have seen > recomendations that using huge pages (so more addresses can fit in > TLB) will help here but postgresql, unlike oracle, cannot use it for > anything else than page buffering (so 16gb) ... so process memory > still needs to use 4k pages. > The context switches are likely due to large number of runnable processes competing for the CPU. Also, memory bandwidth is increasingly an issue on big boxes ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general