We had a run away process on our database box that used up all the physical and all the virtual memory (swap). This caused the RedHat Linux oom-killer to kill many processes, including some Postgres ones. Postgres went into a funky state after that time: 2008-06-20 14:19:10 CDT [unknown] LOG: invalid length of startup packet 2008-06-20 14:20:50 CDT ERROR: canceling autovacuum task 2008-06-20 14:20:50 CDT CONTEXT: automatic vacuum of table "lms_nna.pg_catalog.pg_listener" 2008-06-20 14:21:48 CDT ERROR: canceling autovacuum task 2008-06-20 14:21:49 CDT CONTEXT: automatic vacuum of table "lms_infiniti.pg_catalog.pg_listener" 2008-06-20 14:27:33 CDT WARNING: worker took too long to start; cancelled 2008-06-20 14:45:53 CDT LOG: server process (PID 22435) was terminated by signal 9: Killed ... 2008-06-20 14:50:28 CDT <dbname> DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2008-06-20 14:50:28 CDT <dbname> HINT: In a moment you should be able to reconnect to the database and repeat your command. ... 2008-06-20 14:50:36 CDT <dbname> FATAL: the database system is in recovery mode Now any attempt to connect to the database yield the message "the database system is in recovery mode". The processes that are still running are: postgres 31103 0.0 0.0 149024 492 ? S Apr19 8:18 /opt/lms/64/postgres/8.3/bin/postmaster -p 6544 -D /pgdata/8.3/data postgres 31109 0.0 0.0 65408 104 ? Ss Apr19 1:54 \_ postgres: logger process postgres 32645 0.0 0.0 152708 564 ? Ss Jun15 3:14 \_ postgres: slony <databasename> 172.16.172.246(41114) idle in transaction postgres 1179 0.0 0.0 152708 348 ? Ss Jun15 3:30 \_ postgres: slony <databasename> 172.16.172.246(41138) idle in transaction I executed strace and gdb on each process except for 31109 (the logger process) Process 31103 attached - interrupt to quit # Main postmaster select(6, [3 5], NULL, NULL, {15, 795000} <unfinished ...> Process 31103 detached [postgres@126053-db2 64]$ strace -p 32645 # connection #1 Process 32645 attached - interrupt to quit futex(0x3954d32930, FUTEX_WAIT, 2, NULL <unfinished ...> Process 32645 detached [postgres@126053-db2 64]$ strace -p 1179 # connection #2 Process 1179 attached - interrupt to quit futex(0x3954d32930, FUTEX_WAIT, 2, NULL <unfinished ...> Process 1179 detached I then used gdb to gather more information: # # For main postmaster process # [postgres@126053-db2 64]$ gdb 31103 GNU gdb Red Hat Linux (6.3.0.0-1.153.el4_6.2rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...31103: No such file or directory. [postgres@126053-db2 64]$ gdb /opt/lms/postgres-8.3_64/bin/postmaster 31103 This GDB was configured as "x86_64-redhat-linux-gnu"...(no debugging symbols found) Using host libthread_db library "/lib64/tls/libthread_db.so.1". Attaching to program: /opt/lms/64/postgres/8.3/bin/postmaster, process 31103 Reading symbols from /usr/lib64/libxslt.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libxslt.so.1 Reading symbols from /usr/lib64/libxml2.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libxml2.so.2 Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libcrypt.so.1 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/tls/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/tls/libm.so.6 Reading symbols from /lib64/tls/libc.so.6... (no debugging symbols found)...done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libz.so.1 Reading symbols from /lib64/tls/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] [New Thread 182894088192 (LWP 31103)] Loaded symbols for /lib64/tls/libpthread.so.0 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libnss_files.so.2 0x0000003954bc0125 in __select_nocancel () from /lib64/tls/libc.so.6 # # For connection no. 1 # [postgres@126053-db2 64]$ gdb /opt/lms/postgres-8.3_64/bin/postmaster 32645 This GDB was configured as "x86_64-redhat-linux-gnu"...(no debugging symbols found) Using host libthread_db library "/lib64/tls/libthread_db.so.1". Attaching to program: /opt/lms/64/postgres/8.3/bin/postmaster, process 32645 Reading symbols from /usr/lib64/libxslt.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libxslt.so.1 Reading symbols from /usr/lib64/libxml2.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libxml2.so.2 Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libcrypt.so.1 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/tls/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/tls/libm.so.6 Reading symbols from /lib64/tls/libc.so.6... (no debugging symbols found)...done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libz.so.1 Reading symbols from /lib64/tls/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] [New Thread 182894088192 (LWP 32645)] Loaded symbols for /lib64/tls/libpthread.so.0 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libnss_files.so.2 Reading symbols from /opt/lms/64/postgres/8.3/lib/plpgsql.so... (no debugging symbols found)...done. Loaded symbols for /opt/lms/64/postgres/8.3/lib/plpgsql.so Reading symbols from /opt/lms/64/postgres/8.3/lib/xxid.so...done. Loaded symbols for /opt/lms/64/postgres/8.3/lib/xxid.so 0x0000003954bd28db in __lll_mutex_lock_wait () from /lib64/tls/libc.so.6 # # For connection no. 3 # [postgres@126053-db2 64]$ gdb /opt/lms/postgres-8.3_64/bin/postmaster 1179 This GDB was configured as "x86_64-redhat-linux-gnu"...(no debugging symbols found) Using host libthread_db library "/lib64/tls/libthread_db.so.1". Attaching to program: /opt/lms/64/postgres/8.3/bin/postmaster, process 1179 Reading symbols from /usr/lib64/libxslt.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libxslt.so.1 Reading symbols from /usr/lib64/libxml2.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libxml2.so.2 Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libcrypt.so.1 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/tls/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/tls/libm.so.6 Reading symbols from /lib64/tls/libc.so.6... (no debugging symbols found)...done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libz.so.1 Reading symbols from /lib64/tls/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] [New Thread 182894088192 (LWP 1179)] Loaded symbols for /lib64/tls/libpthread.so.0 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libnss_files.so.2 Reading symbols from /opt/lms/64/postgres/8.3/lib/plpgsql.so... (no debugging symbols found)...done. Loaded symbols for /opt/lms/64/postgres/8.3/lib/plpgsql.so Reading symbols from /opt/lms/64/postgres/8.3/lib/xxid.so...done. Loaded symbols for /opt/lms/64/postgres/8.3/lib/xxid.so 0x0000003954bd28db in __lll_mutex_lock_wait () from /lib64/tls/libc.so.6 I think the fact that a process used up all the available memory (physical and virtual) caused Postgres to go into a weird state. Now it will not respond to kill, or pg_ctl for shutdown. Would the right thing to do be using kill -9 to stop the server? -- - John L Cheng