Re: Adding more memory = hugh cpu load

alexandre - aldeia digital <adaldeia@xxxxxxxxx> · Tue, 11 Oct 2011 09:14:50 -0300

Em 11-10-2011 03:42, Greg Smith escreveu:
On 10/10/2011 01:31 PM, alexandre - aldeia digital wrote:
I drop checkpoint_timeout to 1min and turn on log_checkpoint:

<2011-10-10 14:18:48 BRT >LOG: checkpoint complete: wrote 6885 buffers
(1.1%); 0 transaction log file(s) added, 0 removed, 1 recycled;
write=29.862 s, sync=28.466 s, total=58.651 s
<2011-10-10 14:18:50 BRT >LOG: checkpoint starting: time

Sync times that go to 20 seconds suggest there's a serious problem here
somewhere. But it would have been better to do these changes one at a
time: turn on log_checkpoints, collect some data, then try lowering
checkpoint_timeout. A checkpoint every minute is normally a bad idea, so
that change may have caused this other issue.

I returned to 5 minutes. Thanks.

procs -------------------memory------------------ ---swap--
-----io---- --system-- -----cpu-------..
r b swpd free buff cache si so bi bo in cs us sy id wa st
34 0 2696 8289288 117852 38432268 0 0 8 2757 2502 4148 80 20 0 0 0
39 1 2696 8286128 117852 38432348 0 0 24 622 2449 4008 80 20 0 0 0
41 0 2696 8291100 117852 38433792 0 0 64 553 2487 3419 83 17 0 0 0
...Notice that we have no idle % in cpu column.

You also have no waiting for I/O! This is just plain strange; checkpoint
sync time spikes with no I/O waits I've never seen before. System time
going to 20% isn't normal either.

Have I anything to detect which proccess was causing the system time 
increasing ?

I don't know what's going on with this server. What I would normally do
in this case is use "top -c" to see what processes are taking up so much
runtime, and then look at what they are doing with pg_stat_activity. You
might see the slow processes in the log files by setting
log_min_duration_statement instead. I'd be suspicious of Linux given
your situation though.

Last night, I put another disk in the server and install Debian 6, 
preserving the same structure, only poiting the olds data in the new 
postgresql 9.0.5 compilation. Today, the problem persists.

And for all that asks: the performance is poor, unusable.

I wonder if increasing the memory is a coincidence, and the real cause
is something related to the fact that you had to reboot to install it.
You might have switched to a newer kernel in the process too, for
example; I'd have to put a kernel bug on the list of suspects with this
unusual vmstat output.

I dont think that is a coincidence, because this machine was rebooted
other times without problem.

Best regards.

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance