Re: heavy load-high cpu itilization

Scott Marlowe <scott.marlowe@xxxxxxxxx> · Fri, 29 Jul 2011 13:54:13 -0600

On Mon, Jul 25, 2011 at 12:00 PM, Filippos <filippos.kal@xxxxxxxxx> wrote:
> Dear all
>
> first of all congratulations on your greak work here since from time to time
> i 've found many answers to my problems. unfortunately for this specific
> problem i didnt find much relevant information, so i would ask for your
> guidance dealing with the following situation:
>
> we have a dedicated server (8.4.4, redhat) with 24 cpus and 36 GB or RAM. i

There are known data eating bugs in 8.4.4 you should upgrade to
8.4.latest as soon as possible.

> would say that the traffic in the server is huge and the cpu utilization is
> pretty high too (avg ~ 75% except during the nights when is it much lower).
> i am trying to tune the server a little bit to handle this problem. the
> incoming data in the database are about 30-40 GB /day.

So you're either CPU or IO bound.  We need to see which.

Look at these two pages:
http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
http://wiki.postgresql.org/wiki/SlowQueryQuestions

to get started.

> at first the checkpoint_segments were set to 50, the checkpoint_timeout at
> 15 min and the checkpoint_completion_target was 0.5 sec.

checkpoint_completion_target is not in seconds, it's a percentage to
have completely by the time the next checkpoint arrives.  a checkpoint
completion target of 1.0 means that the bg writer should write out
data fast enough to flush everything out of WAL to the disks right as
you reach checkpoint timeout.  the more aggressive this is the more of
the data will already be flushed to disk when the timeout occurs.
However, this comes at the expense of more IO overall as multiple
updates to the same block result in multiple writes instead of just
one.

> i noticed that the utilization of the server was higher when it was close to
> making a checkpoint and since the parameter of full_page_writes is ON , i
> changed the parameters mentioned above to (i did that after reading a lot of
> stuff online):
> checkpoint_segments->250
> checkpoint_timeout->40min
> checkpoint_completion_target -> 0.8
>
> but the cpu utilization is not significantly lower. another parameter i will
> certainly change is the wal_buffers which is now set at 64KB and i plan to
> make it 16MB. can this parameter cause a significant percentage of the
> problem?

Most of the work done by checkpointing / background writing is IO
intensive, not CPU intensive.

> are there any suggestions what i can do to tune better the server? i can
> provide any information you find relevant for the configuration of the
> server, the OS, the storage etc

First you need to more accurately identify the problem.  Tools like
iostat, vmstat, top, and so forth can help you figure out if the
problem is that you're IO bound or CPU bound.  It's also possible
you've got a thundering herd issue where there's too many processes
all trying to vie for the limited number of cores at the same time.
If you've got more than 30k to 50k context switches per second in
vmstat it's likely you're getting too many things trying to run at
once.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance