I've recently reviewed the various recent threads on out of memory
problems. We just had a similar issue last night. We have 11
postmasters running on two machines in a cluster environment. Five on
one, six on the other. They've been running in this manner for a little
over a year now.
Configuration:
Quad dual-core Opertons
8 gig memory
Red Hat Advance Server 4
relevant postgresql.conf settings:
tcpip_socket = true
max_connections = 35
shared_buffers = 16000
checkpoint_segments = 10
log_min_error_statement = warning
log_connections = true
log_pid = true
log_timestamp = true
We run a 'vacuum full analyze' once a week (and I've seen a thread that
says this should not be necessary).
Just the same, last night, while running a nightly 'vacuum full' process
for our largest database (7.5G base), the vacuum process was killed by
the OS because of out of memory issues.
Aug 27 00:59:07 gan-lxc-01 kernel: Out of Memory: Killed process 26169
(postmaster).
The process 26169 does appear to correspond to the vacuum process and
not the database postmaster process. The postmaster process did not
die. We did see the following in the database log:
2007-08-27 00:59:07 [13586] LOG: server process (PID 26169) was
terminated by signal 9
2007-08-27 00:59:07 [13586] LOG: terminating any other active server
processes
2007-08-27 00:59:07 [7790] WARNING: terminating connection because of
crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
2007-08-27 00:59:07 [13586] LOG: all server processes terminated;
reinitializing
2007-08-27 00:59:07 [2031] LOG: database system was interrupted at
2007-08-27 00:58:59 EDT
2007-08-27 00:59:07 [2031] LOG: checkpoint record is at 18/B3DF3B94
2007-08-27 00:59:07 [2031] LOG: redo record is at 18/B3DF3B94; undo
record is at 0/0; shutdown FALSE
2007-08-27 00:59:07 [2031] LOG: next transaction ID: 63340557; next
OID: 6459085
2007-08-27 00:59:07 [2031] LOG: database system was not properly shut
down; automatic recovery in progress
2007-08-27 00:59:07 [2031] LOG: redo starts at 18/B3DF3BD4
2007-08-27 00:59:08 [2033] LOG: connection received:
host=198.212.166.38 port=33787
2007-08-27 00:59:08 [2033] FATAL: the database system is starting up
2007-08-27 00:59:11 [2035] LOG: connection received:
host=XXX.XXX.XXX.XXX port=33788
So, my question is, based on the configuration of this box and the
configuration of postgresql, can anyone point to anything that might
cause this to happen?
--
Until later, Geoffrey
Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
- Benjamin Franklin
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings