Hi
all,
this
afternoon I had a database crash on a test system (The first Postgres crash in
more than three years).
I wasn't
able to identify the reason.....the log doesn't say anything
interesting.
Is there way
the exactly identify the reason of the crash ?
After having
detected the crash, Postgres (v. 8.3.6) started to kill all the other postgres
processes
After many
minutes, it was still keeping on refusing connections saying the db was still
recovering.
Trying
stopping postgres wasn't possible
I noticed
that a process was still alive and I killed it. After that all Postgres process
disappeared and I was able to restart Postgres.
Now
everything works..
Is there an
explanation for this behaviour ?
I include
messages from the log and from the shell hoping is more
clear.
Thank you
very much
Marco
---------------------------------------
2009-05-19 16:04:18 CEST
0 4a092d56.1535 - LOG: server process (PID 22298) was terminated by signal
9: Killed
2009-05-19 16:04:18 CEST 0 4a092d56.1535 - LOG: terminating any other active server processes
2009-05-19 16:04:18 CEST 172.30.2.143(49041) user db 0 4a12b4ab.54a9 - WARNING: terminating connection because of crash of another server process
2009-05-19 16:04:18 CEST 172.30.2.143(49041) user db 0 4a12b4ab.54a9 - DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
2009-05-19 16:04:18 CEST 0 4a092d56.1535 - LOG: terminating any other active server processes
2009-05-19 16:04:18 CEST 172.30.2.143(49041) user db 0 4a12b4ab.54a9 - WARNING: terminating connection because of crash of another server process
2009-05-19 16:04:18 CEST 172.30.2.143(49041) user db 0 4a12b4ab.54a9 - DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
..... kills
many db process .......
2009-05-19 16:05:56 CEST
172.30.2.143(49052) user db 0 4a12bcc4.5a5a - FATAL: the database system
is in recovery mode
2009-05-19 16:06:15 CEST 172.30.2.143(49053) user db 0 4a12bcd7.5a5b - FATAL: the database system is in recovery mode
2009-05-19 16:06:30 CEST 172.30.2.143(49054) user db 0 4a12bce6.5a69 - FATAL: the database system is in recovery mode
2009-05-19 16:06:42 CEST 172.30.2.143(49055) user db 0 4a12bcf2.5a6a - FATAL: the database system is in recovery mode
2009-05-19 16:08:15 CEST 172.30.2.143(49056) user db 0 4a12bd4f.5a93 - FATAL: the database system is in recovery mode
2009-05-19 16:09:57 CEST 172.31.2.139(3041) user db 0 4a12bdb5.5b09 - FATAL: the database system is in recovery mode
2009-05-19 16:10:42 CEST 172.31.1.1(2358) user db 0 4a12bde2.5b71 - FATAL: the database system is in recovery mode
2009-05-19 16:06:15 CEST 172.30.2.143(49053) user db 0 4a12bcd7.5a5b - FATAL: the database system is in recovery mode
2009-05-19 16:06:30 CEST 172.30.2.143(49054) user db 0 4a12bce6.5a69 - FATAL: the database system is in recovery mode
2009-05-19 16:06:42 CEST 172.30.2.143(49055) user db 0 4a12bcf2.5a6a - FATAL: the database system is in recovery mode
2009-05-19 16:08:15 CEST 172.30.2.143(49056) user db 0 4a12bd4f.5a93 - FATAL: the database system is in recovery mode
2009-05-19 16:09:57 CEST 172.31.2.139(3041) user db 0 4a12bdb5.5b09 - FATAL: the database system is in recovery mode
2009-05-19 16:10:42 CEST 172.31.1.1(2358) user db 0 4a12bde2.5b71 - FATAL: the database system is in recovery mode
tried a clean shutdown...
[root@pgtest1 logs]# service postgresql
stop
Stopping PostgreSQL: pg_ctl: server does not shut down
Stopping PostgreSQL: pg_ctl: server does not shut down
In the log :
2009-05-19 16:13:10 CEST
0 4a092d56.1535 - LOG: received fast shutdown request
[root@pgtest1 logs]# ps auxx | grep
post
postgres 5429 0.0 0.0
78208 816 ? Ss
May12 0:14 /usr/local/pgsql/bin/postmaster -D
/var/pgdata
postgres 5436 0.0 0.0 7572 488 ? Ss May12 0:41 postgres: logger process
postgres 17293 0.8 0.1 79976 2664 ? Ss 13:53 1:19 postgres: user db 172.30.2.143(48960) SELECT
root 24147 0.0 0.0 4204 636 pts/0 S+ 16:26 0:00 grep post
postgres 5436 0.0 0.0 7572 488 ? Ss May12 0:41 postgres: logger process
postgres 17293 0.8 0.1 79976 2664 ? Ss 13:53 1:19 postgres: user db 172.30.2.143(48960) SELECT
root 24147 0.0 0.0 4204 636 pts/0 S+ 16:26 0:00 grep post
[root@pgtest1 logs]# kill -9
17293
[root@pgtest1 logs]# ps auxx | grep
post
root 24149 0.0 0.0 4916 632 pts/0 S+ 16:26 0:00 grep post
root 24149 0.0 0.0 4916 632 pts/0 S+ 16:26 0:00 grep post
in the log
2009-05-19 16:26:21 CEST
0 4a092d56.1535 - LOG: abnormal database system shutdown
[root@pgtest1 logs]# service postgresql start
Starting PostgreSQL: ok
[root@pgtest1 logs]# ps auxx | grep post
postgres 24198 42.0 0.1 78560 3620 ? Ss 16:26 0:00 /usr/local/pgsql/bin/postmaster -D /var/pgdata
postgres 24205 0.0 0.0 7812 660 ? Ss 16:26 0:00 postgres: logger process
postgres 24207 0.0 0.0 78512 824 ? Ss 16:26 0:00 postgres: writer process
postgres 24208 0.0 0.0 78512 792 ? Ss 16:26 0:00 postgres: wal writer process
postgres 24209 0.0 0.0 78560 960 ? Ss 16:26 0:00 postgres: autovacuum launcher process
postgres 24210 0.0 0.0 7952 828 ? Ss 16:26 0:00 postgres: stats collector process
root 24212 0.0 0.0 4076 644 pts/0 R+ 16:26 0:00 grep post
postgres 24198 42.0 0.1 78560 3620 ? Ss 16:26 0:00 /usr/local/pgsql/bin/postmaster -D /var/pgdata
postgres 24205 0.0 0.0 7812 660 ? Ss 16:26 0:00 postgres: logger process
postgres 24207 0.0 0.0 78512 824 ? Ss 16:26 0:00 postgres: writer process
postgres 24208 0.0 0.0 78512 792 ? Ss 16:26 0:00 postgres: wal writer process
postgres 24209 0.0 0.0 78560 960 ? Ss 16:26 0:00 postgres: autovacuum launcher process
postgres 24210 0.0 0.0 7952 828 ? Ss 16:26 0:00 postgres: stats collector process
root 24212 0.0 0.0 4076 644 pts/0 R+ 16:26 0:00 grep post
from the log
:
2009-05-19 16:26:43 CEST 0 4a12c1a3.5e8e -
LOG: database system was interrupted; last known up at 2009-05-19 15:59:31
CEST
2009-05-19 16:26:43 CEST 0 4a12c1a3.5e8e - LOG: database system was not properly shut down; automatic recovery in progress
2009-05-19 16:26:43 CEST 0 4a12c1a3.5e8e - LOG: record with zero length at 274/F0FE7B18
2009-05-19 16:26:43 CEST 0 4a12c1a3.5e8e - LOG: redo is not required
2009-05-19 16:26:43 CEST 0 4a12c1a3.5e91 - LOG: autovacuum launcher started
2009-05-19 16:26:43 CEST 0 4a12c1a1.5e86 - LOG: database system is ready to accept connections
2009-05-19 16:26:43 CEST 0 4a12c1a3.5e8e - LOG: database system was not properly shut down; automatic recovery in progress
2009-05-19 16:26:43 CEST 0 4a12c1a3.5e8e - LOG: record with zero length at 274/F0FE7B18
2009-05-19 16:26:43 CEST 0 4a12c1a3.5e8e - LOG: redo is not required
2009-05-19 16:26:43 CEST 0 4a12c1a3.5e91 - LOG: autovacuum launcher started
2009-05-19 16:26:43 CEST 0 4a12c1a1.5e86 - LOG: database system is ready to accept connections