Re: High CPU Utilization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mar 20, 2009, at 4:29 PM, Scott Marlowe wrote:

On Fri, Mar 20, 2009 at 2:26 PM, Joe Uhl <joeuhl@xxxxxxxxx> wrote:
On Mar 17, 2009, at 12:19 AM, Greg Smith wrote:

On Tue, 17 Mar 2009, Gregory Stark wrote:

Hm, well the tests I ran for posix_fadvise were actually on a Perc5 --
though
who knows if it was the same under the hood -- and I saw better
performance
than this. I saw about 4MB/s for a single drive and up to about 35MB/s
for 15
drives. However this was using linux md raid-0, not hardware raid.

Right, it's the hardware RAID on the Perc5 I think people mainly complain about. If you use it in JBOD mode and let the higher performance CPU in
your main system drive the RAID functions it's not so bad.

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

I have not yet had a chance to try software raid on the standby server (still planning to) but wanted to follow up to see if there was any good way to figure out what the postgresql processes are spending their CPU time on.

We are under peak load right now, and I have Zabbix plotting CPU utilization and CPU wait (from vmstat output) along with all sorts of other vitals on charts. CPU utilization is a sustained 90% - 95% and CPU Wait is hanging below 10%. Since being pointed at vmstat by this list I have been watching CPU Wait and it does get high at times (hence still wanting to try Perc5 in JBOD) but then there are sustained periods, right now included, where our CPUs are just getting crushed while wait and IO (only doing about 1.5 MB/sec
right now) are very low.

This high CPU utilization only occurs when under peak load and when our JDBC
pools are fully loaded.  We are moving more things into our cache and
constantly tuning indexes/tables but just want to see if there is some
underlying cause that is killing us.

Any recommendations for figuring out what our database is spending its CPU
time on?

What does the cs entry on vmstat say at this time?  If you're cs is
skyrocketing then you're getting a context switch storm, which is
usually a sign that there are just too many things going on at once /
you've got an old kernel things like that.

cs column (plus cpu columns) of vmtstat 1 30 reads as follows:

cs    us  sy id wa
11172 95  4  1  0
12498 94  5  1  0
14121 91  7  1  1
11310 90  7  1  1
12918 92  6  1  1
10613 93  6  1  1
9382  94  4  1  1
14023 89  8  2  1
10138 92  6  1  1
11932 94  4  1  1
15948 93  5  2  1
12919 92  5  3  1
10879 93  4  2  1
14014 94  5  1  1
9083  92  6  2  0
11178 94  4  2  0
10717 94  5  1  0
9279  97  2  1  0
12673 94  5  1  0
8058  82 17  1  1
8150  94  5  1  1
11334 93  6  0  0
13884 91  8  1  0
10159 92  7  0  0
9382  96  4  0  0
11450 95  4  1  0
11947 96  3  1  0
8616  95  4  1  0
10717 95  3  1  0

We are running on 2.6.28.7-2 kernel. I am unfamiliar with vmstat output but reading the man page (and that cs = "context switches per second") makes my numbers seem very high.

Our sum JDBC pools currently top out at 400 connections (and we are doing work on all 400 right now). I may try dropping those pools down even smaller. Are there any general rules of thumb for figuring out how many connections you should service at maximum? I know of the memory constraints, but thinking more along the lines of connections per CPU core.


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux