Occasional giant spikes in CPU load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts just a short time -- it jumps to 10-20 CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike events.  During these spikes, the system is completely unresponsive (you can't even login via ssh).

I managed to capture one such event using top(1) with the "batch" option as a background process.  See output below - it shows 19 active postgress processes, but I think it missed the bulk of the spike.

For some reason, every postgres backend suddenly decides (is told?) to do something.  When this happens, the system become unusable for anywhere from ten seconds to a minute or so, depending on how much web traffic stacks up behind this event.  We have two servers, one offline and one public, and they both do this, so it's not caused by actual web traffic (and the Apache logs don't show any HTTP activity correlated with the spikes).

I thought based on other posts that this might be a background-writer problem, but it's not I/O, it's all CPU as far as I can tell.

Any ideas where I can look to find what's triggering this?

8 CPUs, 8 GB memory
8-disk RAID10 (10k SATA)
Postgres 8.3.0
Fedora 8, kernel is 2.6.24.4-64.fc8
Diffs from original postgres.conf:

max_connections = 1000
shared_buffers = 2000MB
work_mem = 256MB
max_fsm_pages = 16000000
max_fsm_relations = 625000
synchronous_commit = off
wal_buffers = 256kB
checkpoint_segments = 30
effective_cache_size = 4GB
escape_string_warning = off

Thanks,
Craig


top - 11:24:59 up 81 days, 20:27,  4 users,  load average: 0.98, 0.83, 0.92
Tasks: 366 total,  20 running, 346 sleeping,   0 stopped,   0 zombie
Cpu(s): 30.6%us,  1.5%sy,  0.0%ni, 66.3%id,  1.5%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8194800k total,  8118688k used,    76112k free,       36k buffers
Swap:  2031608k total,   169348k used,  1862260k free,  7313232k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
18972 postgres  20   0 2514m  11m 8752 R   11  0.1   0:00.35 postmaster
10618 postgres  20   0 2514m  12m 9456 R    9  0.2   0:00.54 postmaster
10636 postgres  20   0 2514m  11m 9192 R    9  0.1   0:00.45 postmaster
25903 postgres  20   0 2514m  11m 8784 R    9  0.1   0:00.21 postmaster
10626 postgres  20   0 2514m  11m 8716 R    6  0.1   0:00.45 postmaster
10645 postgres  20   0 2514m  12m 9352 R    6  0.2   0:00.42 postmaster
10647 postgres  20   0 2514m  11m 9172 R    6  0.1   0:00.51 postmaster
18502 postgres  20   0 2514m  11m 9016 R    6  0.1   0:00.23 postmaster
10641 postgres  20   0 2514m  12m 9296 R    5  0.2   0:00.36 postmaster
10051 postgres  20   0 2514m  13m  10m R    4  0.2   0:00.70 postmaster
10622 postgres  20   0 2514m  12m 9216 R    4  0.2   0:00.39 postmaster
10640 postgres  20   0 2514m  11m 8592 R    4  0.1   0:00.52 postmaster
18497 postgres  20   0 2514m  11m 8804 R    4  0.1   0:00.25 postmaster
18498 postgres  20   0 2514m  11m 8804 R    4  0.1   0:00.22 postmaster
10341 postgres  20   0 2514m  13m   9m R    2  0.2   0:00.57 postmaster
10619 postgres  20   0 2514m  12m 9336 R    1  0.2   0:00.38 postmaster
15687 postgres  20   0 2321m  35m  35m R    0  0.4   8:36.12 postmaster



--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux