WAL archive stopping too early

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm trying to set up WAL archiving on my server, and replicate my live database to a hot-spare server using online backups.
However, the postmaster on the hot-spare is stopping too early during the recovery.

In my archived xlog folder on the live server, I have the following xlogs (note the backup history file):

-rw-------  1 postgresql datahandler 16777216 2006-02-16 15:14 000000010000000900000019
-rw-------  1 postgresql datahandler      273 2006-02-16 15:14 000000010000000900000019.0035C2F4.backup
-rw-------  1 postgresql datahandler 16777216 2006-02-16 15:14 00000001000000090000001A
-rw-------  1 postgresql datahandler 16777216 2006-02-16 15:50 00000001000000090000001B
-rw-------  1 postgresql datahandler 16777216 2006-02-16 15:50 00000001000000090000001C
-rw-------  1 postgresql datahandler 16777216 2006-02-16 15:50 00000001000000090000001D
-rw-------  1 postgresql datahandler 16777216 2006-02-16 15:50 00000001000000090000001E
-rw-------  1 postgresql datahandler 16777216 2006-02-16 15:25 00000001000000090000001F
-rw-------  1 postgresql datahandler 16777216 2006-02-16 15:35 000000010000000900000020
-rw-------  1 postgresql datahandler 16777216 2006-02-16 15:38 000000010000000900000021
-rw-------  1 postgresql datahandler 16777216 2006-02-16 15:44 000000010000000900000022
-rw-------  1 postgresql datahandler 16777216 2006-02-16 15:46 000000010000000900000023

I have replicated this folder to the xlog_archive folder on the hot-spare.  I have taken a tar of the entire $PGDATA on the live server (during a start_backup and stop_backup), and un-tarred it on the hot-spare (changing the owner/permissions as appropriate.)  Note the pg_xlog was not included in the tar, so I've created the correct structure on the spare (as a symlink to a separate physical disk.)

In my recovery.conf, I've added the single recovery_command and changed the owner to my superuser (postgresql.)

Then when I start the postmaster on the spare, here's what follows in the log:

2006-02-17 08:59:40 GMT LOG:  starting archive recovery
2006-02-17 08:59:40 GMT LOG:  restore_command = "cp /mndata/logs/xlog_archive/%f "%p""
cp: cannot stat `/mndata/logs/xlog_archive/00000001.history': No such file or directory
2006-02-17 08:59:40 GMT LOG:  restored log file "000000010000000900000019.0035C2F4.backup" from archive
2006-02-17 08:59:40 GMT LOG:  restored log file "000000010000000900000019" from archive
2006-02-17 08:59:40 GMT LOG:  checkpoint record is at 9/1935C2F4
2006-02-17 08:59:40 GMT LOG:  redo record is at 9/1935C2F4; undo record is at 0/0; shutdown FALSE
2006-02-17 08:59:40 GMT LOG:  next transaction ID: 795500; next OID: 37184
2006-02-17 08:59:40 GMT LOG:  next MultiXactId: 1; next MultiXactOffset: 0
2006-02-17 08:59:40 GMT LOG:  automatic recovery in progress
2006-02-17 08:59:40 GMT LOG:  redo starts at 9/1935C338
2006-02-17 08:59:42 GMT LOG:  restored log file "00000001000000090000001A" from archive
2006-02-17 08:59:44 GMT LOG:  restored log file "00000001000000090000001B" from archive
2006-02-17 09:00:09 GMT LOG:  record with zero length at 9/1BEB2430
2006-02-17 09:00:09 GMT LOG:  redo done at 9/1BEB23EC
2006-02-17 09:00:09 GMT LOG:  restored log file "00000001000000090000001B" from archive
2006-02-17 09:00:09 GMT LOG:  archive recovery complete
2006-02-17 09:00:20 GMT LOG:  database system is ready
2006-02-17 09:00:20 GMT LOG:  transaction ID wrap limit is 1074536488, limited by database "dspam"

Note there are plenty more files ready to run in the archive, and the contents of the backup label only mention 1A as the final WAL file of the backup, so it is starting to read the next WAL file, but stopping for some reason.  Is it because of the "record with zero length" that's causing it to fail?  Also why is it restoring 1B from the archive twice?  There is a committed transaction in the live database that is not appearing in the hot-spare after the restore, which is how I know it's not completing correctly.

Here's the backup label file contents:

START WAL LOCATION: 9/1935C2F4 (file 000000010000000900000019)
STOP WAL LOCATION: 9/1AC3DCF8 (file 00000001000000090000001A)
CHECKPOINT LOCATION: 9/1935C2F4
START TIME: 2006-02-16 11:39:57 GMT
LABEL: /perth/backup/db/base_20060216.tar.gz
STOP TIME: 2006-02-16 12:13:49 GMT

Can anyone shed any light?

Many thanks

Andy Shellam
the mail network - server support

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux