Hi all, Today, I tried to recover a postgresql cluster (pitr). I think it went fine (although I'm not sure), but I got some warnings afterwards... 2010-07-15 23:01:48 CEST [22907]: [2-1] user=,db= LOG: starting archive recovery 2010-07-15 23:01:48 CEST [22907]: [3-1] user=,db= LOG: restore_command = '/opt/postgres/data/dov/data/pitr_ssh.sh %p %f' 2010-07-15 23:01:48 CEST [22907]: [4-1] user=,db= LOG: recovery_target_time = '2010-07-15 14:00:00+02' /usr/bin/nsr/recover -d /zones/postgres-on-charlotte/fs/dov/data/pg_xlog -a /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001.history 6469:recover: /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001.history not in index 67847:recover: Failed to recover files: Nothing to recover /usr/bin/nsr/recover -d /zones/postgres-on-charlotte/fs/dov/data/pg_xlog -a /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000C.00000020.backup Recovering 1 file from /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/ into /zones/postgres-on-charlotte/fs/dov/data/pg_xlog Requesting 1 file(s), this may take a while... ./00000001000000000000000C.00000020.backup Received 1 file(s) from NSR server `backup-master1' Recover completion time: Thu Jul 15 23:01:59 2010 2010-07-15 23:01:59 CEST [22907]: [5-1] user=,db= LOG: restored log file "00000001000000000000000C.00000020.backup" from archive /usr/bin/nsr/recover -d /zones/postgres-on-charlotte/fs/dov/data/pg_xlog -a /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000C Recovering 1 file from /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/ into /zones/postgres-on-charlotte/fs/dov/data/pg_xlog Requesting 1 file(s), this may take a while... Total estimated disk space needed for recover is 5 KB. ./00000001000000000000000C Received 1 file(s) from NSR server `backup-master1' Recover completion time: Thu Jul 15 23:02:02 2010 2010-07-15 23:02:02 CEST [22907]: [6-1] user=,db= LOG: restored log file "00000001000000000000000C" from archive 2010-07-15 23:02:02 CEST [22907]: [7-1] user=,db= LOG: automatic recovery in progress 2010-07-15 23:02:02 CEST [22907]: [8-1] user=,db= LOG: redo starts at 0/C000068, consistency will be reached at 0/C000088 2010-07-15 23:02:02 CEST [22907]: [9-1] user=,db= LOG: consistent recovery state reached /usr/bin/nsr/recover -d /zones/postgres-on-charlotte/fs/dov/data/pg_xlog -a /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000D Recovering 1 file from /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/ into /zones/postgres-on-charlotte/fs/dov/data/pg_xlog Requesting 1 file(s), this may take a while... Total estimated disk space needed for recover is 7169 KB. ./00000001000000000000000D Received 1 file(s) from NSR server `backup-master1' Recover completion time: Thu Jul 15 23:02:10 2010 2010-07-15 23:02:10 CEST [22907]: [10-1] user=,db= LOG: restored log file "00000001000000000000000D" from archive 2010-07-15 23:02:12 CEST [22907]: [11-1] user=,db= LOG: recovery stopping before commit of transaction 1003, time 2010-07-15 14:46:23.65437+02 2010-07-15 23:02:12 CEST [22907]: [12-1] user=,db= LOG: redo done at 0/D0010F0 /usr/bin/nsr/recover -d /zones/postgres-on-charlotte/fs/dov/data/pg_xlog -a /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000002.history 6469:recover: /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000002.history not in index 67847:recover: Failed to recover files: Nothing to recover 2010-07-15 23:02:13 CEST [22907]: [13-1] user=,db= LOG: selected new timeline ID: 2 /usr/bin/nsr/recover -d /zones/postgres-on-charlotte/fs/dov/data/pg_xlog -a /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001.history 6469:recover: /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001.history not in index 67847:recover: Failed to recover files: Nothing to recover 2010-07-15 23:02:15 CEST [22907]: [14-1] user=,db= LOG: archive recovery complete 2010-07-15 23:02:15 CEST [25436]: [1-1] user=,db= LOG: checkpoint starting: end-of-recovery immediate wait 2010-07-15 23:02:15 CEST [25436]: [2-1] user=,db= LOG: checkpoint complete: wrote 6 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.009 s, total=0.180 s 2010-07-15 23:02:15 CEST [27805]: [1-1] user=,db= LOG: autovacuum launcher started 2010-07-15 23:02:15 CEST [22886]: [5-1] user=,db= LOG: database system is ready to accept connections I think everything went fine till here. If not, please tell me what I did wrong. From here on, the log file is filled with messages like this: 3817:save: Using backup-master1 as server /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000E: No such file or directory 2010-07-15 23:02:16: BACKUP WAL: ERROR (1) - /usr/bin/nsr/save -b PGLOGS /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000E 2010-07-15 23:02:16 CEST [27810]: [1-1] user=,db= LOG: archive command failed with exit code 1 2010-07-15 23:02:16 CEST [27810]: [2-1] user=,db= DETAIL: The failed archive command was: /opt/postgres/dba/scripts/backup_WAL-135.sh pg_xlog/00000001000000000000000E 3817:save: Using backup-master1 as server /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000E: No such file or directory 2010-07-15 23:02:18: BACKUP WAL: ERROR (1) - /usr/bin/nsr/save -b PGLOGS /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000E 2010-07-15 23:02:18 CEST [27810]: [3-1] user=,db= LOG: archive command failed with exit code 1 2010-07-15 23:02:18 CEST [27810]: [4-1] user=,db= DETAIL: The failed archive command was: /opt/postgres/dba/scripts/backup_WAL-135.sh pg_xlog/00000001000000000000000E 3817:save: Using backup-master1 as server /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000E: No such file or directory 2010-07-15 23:02:20: BACKUP WAL: ERROR (1) - /usr/bin/nsr/save -b PGLOGS /zones/postgres-on-charlotte/root/opt/postgres/data/dov/data/pg_xlog/00000001000000000000000E 2010-07-15 23:02:20 CEST [27810]: [5-1] user=,db= LOG: archive command failed with exit code 1 2010-07-15 23:02:20 CEST [27810]: [6-1] user=,db= DETAIL: The failed archive command was: /opt/postgres/dba/scripts/backup_WAL-135.sh pg_xlog/00000001000000000000000E 2010-07-15 23:02:20 CEST [27810]: [7-1] user=,db= WARNING: transaction log file "00000001000000000000000E" could not be archived: too many failures 3817:save: Using backup-master1 as server Saving WAL file 00000001000000000000000E gives an error. The problem is that the WAL file does not exist in pg_xlog. Does someone know why? The only WAL file I could find in pg_xlog is 00000002000000000000000D (timeline id is 2). The next thing I tried was looking for 00000001000000000000000E on our backup server. It seems that it didn't exist (or has never been backupped). Finally I created an empty 00000001000000000000000E in pg_xlog and postgresql saved the fake WAL file, so I get rid off the warnings. Did I do something wrong during the backup? Or with the recover? Kind regards, Guy |