On Wed, Apr 7, 2010 at 2:37 PM, Craig James <craig_james@xxxxxxxxxxxxxx> wrote: > Most of the time Postgres runs nicely, but two or three times a day we get a > huge spike in the CPU load that lasts just a short time -- it jumps to 10-20 > CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike > events. During these spikes, the system is completely unresponsive (you > can't even login via ssh). You need to find out what all those Postgres processes are doing. You might try enabling update_process_title and then using ps to figure out what each instance is using. Otherwise, you might try enabling logging of commands that take a certain amount of time to run (see log_min_duration_statement). > I managed to capture one such event using top(1) with the "batch" option as > a background process. See output below - it shows 19 active postgress > processes, but I think it missed the bulk of the spike. Looks like it. The system doesn't appear to be overloaded at all at that point. > 8 CPUs, 8 GB memory > 8-disk RAID10 (10k SATA) > Postgres 8.3.0 Should definitely update to the latest 8.3.10 - 8.3 has a LOT of known bugs. > Fedora 8, kernel is 2.6.24.4-64.fc8 Wow, that is very old, too. > Diffs from original postgres.conf: > > max_connections = 1000 > shared_buffers = 2000MB > work_mem = 256MB work_mem is way too high for 1000 connections and 8GB ram. You could simply be starting up too many postgres processes and overwhelming the machine. Either significantly reduce max_connections or work_mem. > max_fsm_pages = 16000000 > max_fsm_relations = 625000 > synchronous_commit = off You are playing with fire here. You should never turn this off unless you do not care if your data becomes irrecoverably corrupted. > top - 11:24:59 up 81 days, 20:27, 4 users, load average: 0.98, 0.83, 0.92 > Tasks: 366 total, 20 running, 346 sleeping, 0 stopped, 0 zombie > Cpu(s): 30.6%us, 1.5%sy, 0.0%ni, 66.3%id, 1.5%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 8194800k total, 8118688k used, 76112k free, 36k buffers > Swap: 2031608k total, 169348k used, 1862260k free, 7313232k cached System load looks very much OK given that you have 8 CPUs. > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 18972 postgres 20 0 2514m 11m 8752 R 11 0.1 0:00.35 postmaster > 10618 postgres 20 0 2514m 12m 9456 R 9 0.2 0:00.54 postmaster > 10636 postgres 20 0 2514m 11m 9192 R 9 0.1 0:00.45 postmaster > 25903 postgres 20 0 2514m 11m 8784 R 9 0.1 0:00.21 postmaster > 10626 postgres 20 0 2514m 11m 8716 R 6 0.1 0:00.45 postmaster > 10645 postgres 20 0 2514m 12m 9352 R 6 0.2 0:00.42 postmaster > 10647 postgres 20 0 2514m 11m 9172 R 6 0.1 0:00.51 postmaster > 18502 postgres 20 0 2514m 11m 9016 R 6 0.1 0:00.23 postmaster > 10641 postgres 20 0 2514m 12m 9296 R 5 0.2 0:00.36 postmaster > 10051 postgres 20 0 2514m 13m 10m R 4 0.2 0:00.70 postmaster > 10622 postgres 20 0 2514m 12m 9216 R 4 0.2 0:00.39 postmaster > 10640 postgres 20 0 2514m 11m 8592 R 4 0.1 0:00.52 postmaster > 18497 postgres 20 0 2514m 11m 8804 R 4 0.1 0:00.25 postmaster > 18498 postgres 20 0 2514m 11m 8804 R 4 0.1 0:00.22 postmaster > 10341 postgres 20 0 2514m 13m 9m R 2 0.2 0:00.57 postmaster > 10619 postgres 20 0 2514m 12m 9336 R 1 0.2 0:00.38 postmaster > 15687 postgres 20 0 2321m 35m 35m R 0 0.4 8:36.12 postmaster Judging by the amount of CPU time each postmaster as accumulated, they are all fairly new processes. How many pg proceses of the ~350 currently running are there? -Dave -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance