Last night one of my databases broke down temporary because of a
segmentation fault.
It has only happended this time and the database was fully recovered
afterwards,
but I was wondering what I can do anything to prevent it from happening
again
It happened while the backup was running (pg_dump & pgdumpall)
Here are some details from the logs etc.
The system is running Ubuntu Linux and I'm using the PostgreSQL package
from the dapper repository:
uname -a
Linux db 2.6.15-26-amd64-server #1 SMP Fri Jul 7 20:02:26 UTC 2006
x86_64 GNU/Linux
select version()
PostgreSQL 8.1.4 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.0.gcc-opt
(GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5)
pgsql log
2006-08-16 00:38:22 CEST - LOG: server process (PID 4792) was
terminated by sig
nal 11
2006-08-16 00:38:22 CEST - LOG: terminating any other active server
processes
2006-08-16 00:38:22 CEST - WARNING: terminating connection because of
crash of
another server process
2006-08-16 00:38:22 CEST - DETAIL: The postmaster has commanded this
server pro
cess to roll back the current transaction and exit, because another
server proce
ss exited abnormally and possibly corrupted shared memory.
2006-08-16 00:38:22 CEST - HINT: In a moment you should be able to
reconnect to
the database and repeat your command.
DETAIL and HINT repeated for every connection
2006-08-16 00:38:23 CEST - LOG: all server processes terminated;
reinitializing
2006-08-16 00:38:23 CEST - LOG: database system was interrupted at
2006-08-16 0
0:36:21 CEST
2006-08-16 00:38:23 CEST - LOG: checkpoint record is at 5/4F9FDC00
2006-08-16 00:38:23 CEST - LOG: redo record is at 5/4F9B3558; undo
record is at
0/0; shutdown FALSE
2006-08-16 00:38:23 CEST - LOG: next transaction ID: 5408607; next OID:
30199
2006-08-16 00:38:23 CEST - LOG: next MultiXactId: 1; next
MultiXactOffset: 0
2006-08-16 00:38:23 CEST - LOG: database system was not properly shut
down; aut
omatic recovery in progress
2006-08-16 00:38:23 CEST - FATAL: the database system is starting up
2006-08-16 00:38:23 CEST - LOG: redo starts at 5/4F9B3558
2006-08-16 00:38:23 CEST - LOG: record with zero length at 5/4FB63C18
2006-08-16 00:38:23 CEST - LOG: redo done at 5/4FB63BE8
2006-08-16 00:38:26 CEST - LOG: database system is ready
2006-08-16 00:38:26 CEST - LOG: transaction ID wrap limit is
1073864149, limite
d by database "db"
At 00:36:21 this was happening in the pgsql log
2006-08-16 00:36:21 CEST - LOG: duration: 14673.110 ms statement:
EXECUTE <unn
amed> [PREPARE: select * from app.insert_unitstat($1,$2,$3,$4,$5,$6,$7,
$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21) as result]
2006-08-16 00:36:21 CEST - LOG: duration: 8730.029 ms statement:
EXECUTE <unna
med> [PREPARE: select * from app.insert_unitstat($1,$2,$3,$4,$5,$6,$7,$
8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21) as result]
2006-08-16 00:36:21 CEST - LOG: duration: 5982.330 ms statement:
EXECUTE <unna
med> [PREPARE: select * from app.insert_unitstat($1,$2,$3,$4,$5,$6,$7,$
8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21) as result]
2006-08-16 00:36:21 CEST - LOG: duration: 10404.601 ms statement:
EXECUTE <unn
amed> [PREPARE: select * from app.insert_unitstat($1,$2,$3,$4,$5,$6,$7,
$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21) as result]
These statements are called in a plpgsql function and the function is
called via JDBC
using postgresql-8.1-407.jdbc3.jar
dmesg
[2425253.737383] postmaster[4792]: segfault at 00002aaab6f0e000 rip
00002aaaab73795b rsp 00007fffff8c9228 error 4
Any suggestions ?
Thanks,
Poul