Hi guys, I've inherited a PITR continuous recovery master/standby server pair.
The continuous recovery and loading of the xlogs seems to work fine, but when
I opted to test the replica bring up, it falls over with signal 6.
Here's an excerpt from the log with log levels set up to debug5:
DEBUG: max_safe_fds = 984, usable_fds = 1000, already_open = 6
LOG: database system was interrupted at 2007-08-31 15:09:40 PDT
LOG: starting archive recovery
LOG: restore_command = "/data/pg/data/recovery.py /data/pg/wal/%f "%p""
DEBUG: executing restore command "/data/pg/data/recovery.py
/data/pg/wal/00000001.history "pg_xlog/RECOVERYHISTORY""
DEBUG: could not restore file "00000001.history" from archive: return code 0
DEBUG: executing restore command "/data/pg/data/recovery.py
/data/pg/wal/00000001000000000000004C.00000020.backup
"pg_xlog/RECOVERYHISTORY""
LOG: restored log file "00000001000000000000004C.00000020.backup" from
archive
DEBUG: executing restore command "/data/pg/data/recovery.py
/data/pg/wal/00000001000000000000004C "pg_xlog/RECOVERYXLOG""
LOG: restored log file "00000001000000000000004C" from archive
LOG: checkpoint record is at 0/4C000020
LOG: redo record is at 0/4C000020; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 0/1222; next OID: 24576
LOG: next MultiXactId: 1; next MultiXactOffset: 0
LOG: automatic recovery in progress
LOG: redo starts at 0/4C000070
DEBUG: executing restore command "/data/pg/data/recovery.py
/data/pg/wal/00000001000000000000004D "pg_xlog/RECOVERYXLOG""
LOG: restored log file "00000001000000000000004D" from archive
DEBUG: recovery restart point at 0/4D000020
CONTEXT: xlog redo checkpoint: redo 0/4D000020; undo 0/0; tli 1; xid 0/1232;
oid 24576; multi 1; offset 0; online
DEBUG: executing restore command "/data/pg/data/recovery.py
/data/pg/wal/00000001000000000000004E "pg_xlog/RECOVERYXLOG""
LOG: restored log file "00000001000000000000004E" from archive
DEBUG: recovery restart point at 0/4E000020
CONTEXT: xlog redo checkpoint: redo 0/4E000020; undo 0/0; tli 1; xid 0/1242;
oid 24576; multi 1; offset 0; online
DEBUG: executing restore command "/data/pg/data/recovery.py
/data/pg/wal/00000001000000000000004F "pg_xlog/RECOVERYXLOG""
LOG: restored log file "00000001000000000000004F" from archive
DEBUG: recovery restart point at 0/4F000020
CONTEXT: xlog redo checkpoint: redo 0/4F000020; undo 0/0; tli 1; xid 0/1252;
oid 24576; multi 1; offset 0; online
DEBUG: executing restore command "/data/pg/data/recovery.py
/data/pg/wal/000000010000000000000050 "pg_xlog/RECOVERYXLOG""
DEBUG: could not restore file "000000010000000000000050" from archive: return
code 256
LOG: could not open file "pg_xlog/000000010000000000000050" (log file 0,
segment 80): No such file or directory
LOG: redo done at 0/4F000070
DEBUG: executing restore command "/data/pg/data/recovery.py
/data/pg/wal/00000001000000000000004F "pg_xlog/RECOVERYXLOG""
DEBUG: could not restore file "00000001000000000000004F" from archive: return
code 256
PANIC: could not open file "pg_xlog/00000001000000000000004F" (log file 0,
segment 79): No such file or directory
DEBUG: reaping dead processes
LOG: startup process (PID 31211) was terminated by signal 6
LOG: aborting startup due to startup process failure
DEBUG: proc_exit(1)
DEBUG: shmem_exit(1)
DEBUG: exit(1)
You can see the rest of the log file here:
http://pgsql.privatepaste.com/841mOR5gzm
It seems like everything is happy, except it seems to ask for the 4F log file
more than once.
Directory permissions look good:
total 45
-rw------- 1 postgres postgres 4 Aug 31 11:13 PG_VERSION
-rw------- 1 postgres postgres 167 Aug 31 15:09 backup_label.old
drwx------ 5 postgres postgres 120 Aug 31 14:59 base
drwx------ 2 postgres postgres 768 Aug 31 15:08 global
drwx------ 2 postgres postgres 72 Aug 31 11:13 pg_clog
-rw------- 1 postgres postgres 3279 Aug 31 11:16 pg_hba.conf
-rw------- 1 postgres postgres 1460 Aug 31 11:14 pg_ident.conf
drwx------ 4 postgres postgres 96 Aug 31 11:13 pg_multixact
drwx------ 2 postgres postgres 72 Aug 31 11:13 pg_subtrans
drwx------ 2 postgres postgres 48 Aug 31 11:13 pg_tblspc
drwx------ 2 postgres postgres 48 Aug 31 11:13 pg_twophase
drwx------ 3 postgres postgres 112 Aug 31 15:24 pg_xlog
-rw-r--r-- 1 postgres postgres 15370 Aug 31 14:57 postgresql.conf
-rw------- 1 postgres postgres 35 Aug 31 15:09 postmaster.opts
-rw-r--r-- 1 postgres postgres 67 Aug 31 13:40 recovery.conf
-rwxr-xr-x 1 postgres postgres 1384 Aug 31 13:40 recovery.py
I verified that the files are really being copied into
pg_xlog/RECOVERYXLOG.
This is 8.2.4 on Gentoo, but I observed the same behavior with 8.2.3 on
Gentoo.
We've md5sum'd the postgres binary, diffed the output of pg_config and
everything is the same.
I tried starting up postgres in a newly initdb'd directory and it starts up
fine. I also tried using the conf files from the 'broken' directory in this
new directory and it starts up fine, but if I try to replicate into this
directory, postgres exits with signal 6 just the same.
Conversely, using a default postgresql.conf file in the 'broken' directory
leaves it just as broken.
The master can start/stop just fine with no problems.
What else to look at? I do have a core file, back trace can be found here:
http://pgsql.privatepaste.com/8f0WDAAzz9
Any suggestions much appreciated!
--
Jeff Frost, Owner <jeff@xxxxxxxxxxxxxxxxxxxxxx>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match