Hello, all.
I have a Postgresql 8.2.5 running on gentoo linux on sun fire v240
server with storage 3310. And sometimes I have a problem: time to time
one postgres process starts using all CPU time. I can't kill this
process. I can't stop postgres. I can't shutdown server at all from
OS, only reset helps. When it happen (CPU usage is 100%) for about 30
minutes DB works (new connections may establish, queries works fine).
But in 30 minutes DB stop responding.
I understand that this description can't give you enough information
to give an advice what to do. But I don't know what to collect from OS/
PG when it will happen again. Could you give me an advice what should
I do when next time one of process will use all CPU time?
I guess that I can collect stack from this process at that time by
using GDB or something like it. But I have no experience how exactly
to do it.
About DB. It is about 40Gb statistical DB. Every minute some processes
inserts data collected from different sources. The biggest part of
types of statements from clients is INSERT. Also, I have a WEB
interface to this DB which shows a lot of graphs. I have some
aggregation functions in PL/PGSQL which SELECT info and then INSERT or
UPDATE it in some tables. I never use DELETE (except some cases. but
I'm sure that problem appears not while DELETE).
Every day scripts inserts data from the same sources via the same
queries. But problem appears at different time.
More info:
postgres=# show server_version;
server_version
----------------
8.2.4
machupicchu ~ # uname -a
Linux machupicchu 2.6.20-gentoo-r8 #1 SMP Fri Jun 22 09:51:20 MSD 2007
sparc64 sun4u TI UltraSparc IIIi (Jalapeno) GNU/Linux
machupicchu pgdata # grep -v "^#" postgresql.conf | sort -u
autovacuum = on
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 250
autovacuum_naptime = 10min
autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 500
client_min_messages = notice
constraint_exclusion = on
datestyle = 'iso, mdy'
enable_bitmapscan = on
enable_hashagg = on
enable_hashjoin = on
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = on
enable_seqscan = on
enable_sort = on
enable_tidscan = on
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
listen_addresses = '*'
log_directory = 'pg_log'
log_duration = on
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_line_prefix = '%m, %s, %r, %p, '
log_min_duration_statement = 1000
log_min_error_statement = error
log_min_messages = error
log_statement = 'all'
log_truncate_on_rotation = off
maintenance_work_mem = 1572864
max_connections = 100
max_fsm_pages = 153600
redirect_stderr = on
shared_buffers = 24MB
stats_block_level = on
stats_command_string = on
stats_row_level = on
stats_start_collector = on
work_mem = 1048576
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org/