On 4/7/10 3:36 PM, Joshua D. Drake wrote:
On Wed, 2010-04-07 at 14:45 -0700, Craig James wrote:
On 4/7/10 2:40 PM, Joshua D. Drake wrote:
On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote:
Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts just a short time -- it jumps to 10-20 CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike events. During these spikes, the system is completely unresponsive (you can't even login via ssh).
I managed to capture one such event using top(1) with the "batch" option as a background process. See output below - it shows 19 active postgress processes, but I think it missed the bulk of the spike.
What does iostat 5 say during the jump?
It's very hard to say ... I'll have to start a background job to watch for a day or so. While it's happening, you can't login, and any open windows become unresponsive. I'll probably have to run it at high priority using nice(1) to get any data at all during the event.
Do you have sar runing? Say a sar -A ?
No, I don't have it installed. I'll have a look. At first glance it looks like a combination of what I can get with "top -b" and vmstat, but with a single program.
My guess is that it is not CPU, it is IO and your CPU usage is all WAIT
on IO.
To have your CPUs so flooded that they are the cause of an inability to
log in is pretty suspect.
I thought so too, except that I can't login during the flood. If the CPUs were all doing iowaits, logging in should be easy.
Greg's suggestion that shared_buffers and work_mem are too big for an 8 GB system fits these symptoms -- if it's having a swap storm, login is effectively impossible.
Craig
Joshua D. Drake
Thanks,
Craig
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance