Restore problem

Guy Wauters <guy@xxxxxxxxxxxxx> · Fri, 16 Jul 2010 00:05:07 +0200

Hi all,
Today, I tried to recover a postgresql cluster (pitr). I think it went fine (although I'm not sure), but I got some warnings afterwards...

2010-07-15 23:01:48 CEST [22907]: [2-1] user=,db= LOG:  starting archive recovery
2010-07-15 23:01:48 CEST [22907]: [3-1] user=,db= LOG:  restore_command = '/opt/postgres/data/dov/data/pitr_ssh.sh %p %f'
2010-07-15 23:01:48 CEST [22907]: [4-1] user=,db= LOG:  recovery_target_time = '2010-07-15 14:00:00+02'
/usr/bin/nsr/recover -d /zones/postgres-on-charlotte/fs/dov/data/pg_xlog -a /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001.history
6469:recover: /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001.history not in index
67847:recover: Failed to recover files: Nothing to recover

/usr/bin/nsr/recover -d /zones/postgres-on-charlotte/fs/dov/data/pg_xlog -a /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000C.00000020.backup
Recovering 1 file from /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/ into /zones/postgres-on-charlotte/fs/dov/data/pg_xlog
Requesting 1 file(s), this may take a while...
./00000001000000000000000C.00000020.backup
Received 1 file(s) from NSR server `backup-master1'
Recover completion time: Thu Jul 15 23:01:59 2010
2010-07-15 23:01:59 CEST [22907]: [5-1] user=,db= LOG:  restored log file "00000001000000000000000C.00000020.backup" from archive
/usr/bin/nsr/recover -d /zones/postgres-on-charlotte/fs/dov/data/pg_xlog -a /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000C
Recovering 1 file from /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/ into /zones/postgres-on-charlotte/fs/dov/data/pg_xlog
Requesting 1 file(s), this may take a while...
Total estimated disk space needed for recover is 5 KB.
./00000001000000000000000C
Received 1 file(s) from NSR server `backup-master1'
Recover completion time: Thu Jul 15 23:02:02 2010
2010-07-15 23:02:02 CEST [22907]: [6-1] user=,db= LOG:  restored log file "00000001000000000000000C" from archive
2010-07-15 23:02:02 CEST [22907]: [7-1] user=,db= LOG:  automatic recovery in progress
2010-07-15 23:02:02 CEST [22907]: [8-1] user=,db= LOG:  redo starts at 0/C000068, consistency will be reached at 0/C000088
2010-07-15 23:02:02 CEST [22907]: [9-1] user=,db= LOG:  consistent recovery state reached
/usr/bin/nsr/recover -d /zones/postgres-on-charlotte/fs/dov/data/pg_xlog -a /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000D
Recovering 1 file from /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/ into /zones/postgres-on-charlotte/fs/dov/data/pg_xlog
Requesting 1 file(s), this may take a while...
Total estimated disk space needed for recover is 7169 KB.
./00000001000000000000000D
Received 1 file(s) from NSR server `backup-master1'
Recover completion time: Thu Jul 15 23:02:10 2010
2010-07-15 23:02:10 CEST [22907]: [10-1] user=,db= LOG:  restored log file "00000001000000000000000D" from archive
2010-07-15 23:02:12 CEST [22907]: [11-1] user=,db= LOG:  recovery stopping before commit of transaction 1003, time 2010-07-15 14:46:23.65437+02
2010-07-15 23:02:12 CEST [22907]: [12-1] user=,db= LOG:  redo done at 0/D0010F0
/usr/bin/nsr/recover -d /zones/postgres-on-charlotte/fs/dov/data/pg_xlog -a /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000002.history
6469:recover: /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000002.history not in index
67847:recover: Failed to recover files: Nothing to recover

2010-07-15 23:02:13 CEST [22907]: [13-1] user=,db= LOG:  selected new timeline ID: 2
/usr/bin/nsr/recover -d /zones/postgres-on-charlotte/fs/dov/data/pg_xlog -a /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001.history
6469:recover: /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001.history not in index
67847:recover: Failed to recover files: Nothing to recover

2010-07-15 23:02:15 CEST [22907]: [14-1] user=,db= LOG:  archive recovery complete
2010-07-15 23:02:15 CEST [25436]: [1-1] user=,db= LOG:  checkpoint starting: end-of-recovery immediate wait
2010-07-15 23:02:15 CEST [25436]: [2-1] user=,db= LOG:  checkpoint complete: wrote 6 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.009 s, total=0.180 s
2010-07-15 23:02:15 CEST [27805]: [1-1] user=,db= LOG:  autovacuum launcher started
2010-07-15 23:02:15 CEST [22886]: [5-1] user=,db= LOG:  database system is ready to accept connections

I think everything went fine till here. If not, please tell me what I did wrong.

From here on, the log file is filled with messages like this:

3817:save: Using backup-master1 as server
/zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000E: No such file or directory
2010-07-15 23:02:16:    BACKUP WAL: ERROR (1) - /usr/bin/nsr/save -b PGLOGS /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000E
2010-07-15 23:02:16 CEST [27810]: [1-1] user=,db= LOG:  archive command failed with exit code 1
2010-07-15 23:02:16 CEST [27810]: [2-1] user=,db= DETAIL:  The failed archive command was: /opt/postgres/dba/scripts/backup_WAL-135.sh pg_xlog/00000001000000000000000E
3817:save: Using backup-master1 as server
/zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000E: No such file or directory
2010-07-15 23:02:18:    BACKUP WAL: ERROR (1) - /usr/bin/nsr/save -b PGLOGS /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000E
2010-07-15 23:02:18 CEST [27810]: [3-1] user=,db= LOG:  archive command failed with exit code 1
2010-07-15 23:02:18 CEST [27810]: [4-1] user=,db= DETAIL:  The failed archive command was: /opt/postgres/dba/scripts/backup_WAL-135.sh pg_xlog/00000001000000000000000E
3817:save: Using backup-master1 as server
/zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000E: No such file or directory
2010-07-15 23:02:20:    BACKUP WAL: ERROR (1) - /usr/bin/nsr/save -b PGLOGS /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000E
2010-07-15 23:02:20 CEST [27810]: [5-1] user=,db= LOG:  archive command failed with exit code 1
2010-07-15 23:02:20 CEST [27810]: [6-1] user=,db= DETAIL:  The failed archive command was: /opt/postgres/dba/scripts/backup_WAL-135.sh pg_xlog/00000001000000000000000E
2010-07-15 23:02:20 CEST [27810]: [7-1] user=,db= WARNING:  transaction log file "00000001000000000000000E" could not be archived: too many failures
3817:save: Using backup-master1 as server

Saving WAL file 00000001000000000000000E gives an error. The problem is that the WAL file does not exist in pg_xlog. Does someone know why?
The only WAL file I could find in pg_xlog is 00000002000000000000000D (timeline id is 2). The next thing I tried was  looking for 00000001000000000000000E on our backup server. It seems that it didn't exist (or has never been backupped).
Finally I created an empty 00000001000000000000000E in pg_xlog and postgresql saved the fake WAL file, so I get rid off the warnings.

Did I do something wrong during the backup? Or with the recover?

Kind regards,

 Guy