Re: Server hitting 100% CPU usage, system comes to a crawl.

Brian Fehrle <brianf@xxxxxxxxxxxxxxxxxxx> · Tue, 01 Nov 2011 11:56:00 -0600

Update on this:

We did a switchover to another machine with the same hardware, however 
this system was running on some older parameters we had set in the 
postgresql.conf file.

So we went from 400 max_connections to 200 max_connections, and 160MB 
work_mem to 200MB work_mem. And now on this other system, so far it 
seems to be running ok.

Other than the obvious fact that each connection has a certain amount of 
memory usage, is there something else to watch for when increasing 
connections to numbers like 400? When we had the issue of the system 
jumping to 100% cpu usage, even at that point our number of backends to 
the cluster was at MAX 250, but generally in the 175 range, so well 
below our 400 max_connections we allow. So could this be the culprit?

I'll be watching the cluster as we run on the new configuration (with 
only 200 max_connections).

- Brian F

On 10/27/2011 03:22 PM, Brian Fehrle wrote:
On 10/27/2011 02:50 PM, Tom Lane wrote:
Brian Fehrle<brianf@xxxxxxxxxxxxxxxxxxx>  writes:
Hi all, need some help/clues on tracking down a performance issue.
PostgreSQL version: 8.3.11
I've got a system that has 32 cores and 128 gigs of ram. We have
connection pooling set up, with about 100 - 200 persistent connections
open to the database. Our applications then use these connections to
query the database constantly, but when a connection isn't currently
executing a query, it's<IDLE>. On average, at any given time, there are
3 - 6 connections that are actually executing a query, while the rest
are<IDLE>.
About once a day, queries that normally take just a few seconds slow 
way
down, and start to pile up, to the point where instead of just having
3-6 queries running at any given time, we get 100 - 200. The whole
system comes to a crawl, and looking at top, the CPU usage is 99%.
This is jumping to a conclusion based on insufficient data, but what you
describe sounds a bit like the sinval queue contention problems that we
fixed in 8.4.  Some prior reports of that:
http://archives.postgresql.org/pgsql-performance/2008-01/msg00001.php
http://archives.postgresql.org/pgsql-performance/2010-06/msg00452.php

If your symptoms match those, the best fix would be to update to 8.4.x
or later, but a stopgap solution would be to cut down on the number of
idle backends.

            regards, tom lane
That sounds somewhat close to the same issue I am seeing. Main 
differences being that my spike lasts for much longer than a few 
minutes, and can only be resolved when the cluster is restarted. Also, 
that second link shows TOP where much of the CPU is via the 'user', 
rather than the 'sys' like mine.

Is there anything I can look at more to get more info on this 'sinval 
que contention problem'?

Also, having my cpu usage high in 'sys' rather than 'us', could that 
be a red flag? Or is that normal?

- Brian F

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general