Re: Occasional giant spikes in CPU load

David Rees <drees76@xxxxxxxxx> · Wed, 7 Apr 2010 15:56:11 -0700

On Wed, Apr 7, 2010 at 2:37 PM, Craig James <craig_james@xxxxxxxxxxxxxx> wrote:
> Most of the time Postgres runs nicely, but two or three times a day we get a
> huge spike in the CPU load that lasts just a short time -- it jumps to 10-20
> CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike
> events.  During these spikes, the system is completely unresponsive (you
> can't even login via ssh).

You need to find out what all those Postgres processes are doing.  You
might try enabling update_process_title and then using ps to figure
out what each instance is using.  Otherwise, you might try enabling
logging of commands that take a certain amount of time to run (see
log_min_duration_statement).

> I managed to capture one such event using top(1) with the "batch" option as
> a background process.  See output below - it shows 19 active postgress
> processes, but I think it missed the bulk of the spike.

Looks like it.  The system doesn't appear to be overloaded at all at that point.

> 8 CPUs, 8 GB memory
> 8-disk RAID10 (10k SATA)
> Postgres 8.3.0

Should definitely update to the latest 8.3.10 - 8.3 has a LOT of known bugs.

> Fedora 8, kernel is 2.6.24.4-64.fc8

Wow, that is very old, too.

> Diffs from original postgres.conf:
>
> max_connections = 1000
> shared_buffers = 2000MB
> work_mem = 256MB

work_mem is way too high for 1000 connections and 8GB ram.  You could
simply be starting up too many postgres processes and overwhelming the
machine.  Either significantly reduce max_connections or work_mem.

> max_fsm_pages = 16000000
> max_fsm_relations = 625000
> synchronous_commit = off

You are playing with fire here.  You should never turn this off unless
you do not care if your data becomes irrecoverably corrupted.

> top - 11:24:59 up 81 days, 20:27,  4 users,  load average: 0.98, 0.83, 0.92
> Tasks: 366 total,  20 running, 346 sleeping,   0 stopped,   0 zombie
> Cpu(s): 30.6%us,  1.5%sy,  0.0%ni, 66.3%id,  1.5%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Mem:   8194800k total,  8118688k used,    76112k free,       36k buffers
> Swap:  2031608k total,   169348k used,  1862260k free,  7313232k cached

System load looks very much OK given that you have 8 CPUs.

> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 18972 postgres  20   0 2514m  11m 8752 R   11  0.1   0:00.35 postmaster
> 10618 postgres  20   0 2514m  12m 9456 R    9  0.2   0:00.54 postmaster
> 10636 postgres  20   0 2514m  11m 9192 R    9  0.1   0:00.45 postmaster
> 25903 postgres  20   0 2514m  11m 8784 R    9  0.1   0:00.21 postmaster
> 10626 postgres  20   0 2514m  11m 8716 R    6  0.1   0:00.45 postmaster
> 10645 postgres  20   0 2514m  12m 9352 R    6  0.2   0:00.42 postmaster
> 10647 postgres  20   0 2514m  11m 9172 R    6  0.1   0:00.51 postmaster
> 18502 postgres  20   0 2514m  11m 9016 R    6  0.1   0:00.23 postmaster
> 10641 postgres  20   0 2514m  12m 9296 R    5  0.2   0:00.36 postmaster
> 10051 postgres  20   0 2514m  13m  10m R    4  0.2   0:00.70 postmaster
> 10622 postgres  20   0 2514m  12m 9216 R    4  0.2   0:00.39 postmaster
> 10640 postgres  20   0 2514m  11m 8592 R    4  0.1   0:00.52 postmaster
> 18497 postgres  20   0 2514m  11m 8804 R    4  0.1   0:00.25 postmaster
> 18498 postgres  20   0 2514m  11m 8804 R    4  0.1   0:00.22 postmaster
> 10341 postgres  20   0 2514m  13m   9m R    2  0.2   0:00.57 postmaster
> 10619 postgres  20   0 2514m  12m 9336 R    1  0.2   0:00.38 postmaster
> 15687 postgres  20   0 2321m  35m  35m R    0  0.4   8:36.12 postmaster

Judging by the amount of CPU time each postmaster as accumulated, they
are all fairly new processes.  How many pg proceses of the ~350
currently running are there?

-Dave

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance