vacuum killed because of out of memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've recently reviewed the various recent threads on out of memory problems. We just had a similar issue last night. We have 11 postmasters running on two machines in a cluster environment. Five on one, six on the other. They've been running in this manner for a little over a year now.

Configuration:
Quad dual-core Opertons
8 gig memory
Red Hat Advance Server 4

relevant postgresql.conf settings:

tcpip_socket = true
max_connections = 35
shared_buffers = 16000
checkpoint_segments = 10
log_min_error_statement = warning
log_connections = true
log_pid = true
log_timestamp = true

We run a 'vacuum full analyze' once a week (and I've seen a thread that says this should not be necessary).

Just the same, last night, while running a nightly 'vacuum full' process for our largest database (7.5G base), the vacuum process was killed by the OS because of out of memory issues.

Aug 27 00:59:07 gan-lxc-01 kernel: Out of Memory: Killed process 26169 (postmaster).

The process 26169 does appear to correspond to the vacuum process and not the database postmaster process. The postmaster process did not die. We did see the following in the database log:

2007-08-27 00:59:07 [13586] LOG: server process (PID 26169) was terminated by signal 9 2007-08-27 00:59:07 [13586] LOG: terminating any other active server processes 2007-08-27 00:59:07 [7790] WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. 2007-08-27 00:59:07 [13586] LOG: all server processes terminated; reinitializing 2007-08-27 00:59:07 [2031] LOG: database system was interrupted at 2007-08-27 00:58:59 EDT
2007-08-27 00:59:07 [2031] LOG:  checkpoint record is at 18/B3DF3B94
2007-08-27 00:59:07 [2031] LOG: redo record is at 18/B3DF3B94; undo record is at 0/0; shutdown FALSE 2007-08-27 00:59:07 [2031] LOG: next transaction ID: 63340557; next OID: 6459085 2007-08-27 00:59:07 [2031] LOG: database system was not properly shut down; automatic recovery in progress
2007-08-27 00:59:07 [2031] LOG:  redo starts at 18/B3DF3BD4
2007-08-27 00:59:08 [2033] LOG: connection received: host=198.212.166.38 port=33787
2007-08-27 00:59:08 [2033] FATAL:  the database system is starting up
2007-08-27 00:59:11 [2035] LOG: connection received: host=XXX.XXX.XXX.XXX port=33788

So, my question is, based on the configuration of this box and the configuration of postgresql, can anyone point to anything that might cause this to happen?

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
 - Benjamin Franklin

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux