Hello,
We have issue with point-in-time recovery. When applying WAL files on restored postgresql server (we used pg_basebackup to create full cluster backup) we got "invalid magic number 0000 in log file" error. The recovery process applied WAL files from archive location and it was set to specific point in time.
In recovery.conf file were parameters set to:
restore_command = 'cp /db/archive/transaction_logs/%f %p'
recovery_target_time = '2014-06-11 16:45:00'
The recovery stopped at
2014-06-16 15:40:30.514 MDT [32021]: [16580-1] LOG: restored log file "0000001A00001172000000D1" from archive
2014-06-16 15:40:35.957 MDT [32021]: [16581-1] LOG: restored log file "0000001A00001172000000D2" from archive
2014-06-16 15:40:46.457 MDT [32021]: [16582-1] LOG: invalid magic number 0000 in log file 4466, segment 210, offset 6594560
2014-06-16 15:40:46.457 MDT [32021]: [16583-1] LOG: redo done at 1172/D2649068
2014-06-16 15:40:46.457 MDT [32021]: [16584-1] LOG: last completed transaction was at log time 2014-06-11 12:34:13.496717-06
2014-06-16 15:40:46.481 MDT [32021]: [16585-1] LOG: restored log file "0000001A00001172000000D2" from archive
cp: cannot stat `/db/archive/transaction_logs/0000001B.history': No such file or directory
2014-06-16 15:40:46.499 MDT [32021]: [16586-1] LOG: selected new timeline ID: 27
cp: cannot stat `/db/archive/transaction_logs/0000001A.history': No such file or directory
2014-06-16 15:40:49.766 MDT [32021]: [16587-1] LOG: archive recovery complete
2014-06-16 15:55:16.171 MDT [6791]: [1-1] LOG: autovacuum launcher started
2014-06-16 15:55:16.172 MDT [32019]: [1-1] LOG: database system is ready to accept connections
What is interesting, the postgresql server streaming replica/slave, which is set to asynchronous replication had no issue. When we checked the log from replica/slave there was no issue with applying the WAL stream on slave.
Streaming replica/slave log:
2014-06-11 12:20:40.832 MDT [11787] - [] - [] - []: [26862-1] LOG: restartpoint complete: wrote 77860 buffers (7.4%); 0 transaction log file(s) added, 0 removed, 23 recycled; write=719.761 s, sync=0.983 s, total=720.748 s; sync files=5119, longest=0.261 s, average=0.000 s
2014-06-11 12:20:40.832 MDT [11787] - [] - [] - []: [26863-1] LOG: recovery restart point at 1172/6C678380
2014-06-11 12:20:40.832 MDT [11787] - [] - [] - []: [26864-1] DETAIL: last completed transaction was at log time 2014-06-11 12:20:45.928542-06
2014-06-11 12:23:40.170 MDT [11787] - [] - [] - []: [26865-1] LOG: restartpoint starting: time
2014-06-11 12:35:43.936 MDT [11787] - [] - [] - []: [26866-1] LOG: restartpoint complete: wrote 110827 buffers (10.6%); 0 transaction log file(s) added, 0 removed, 30 recycled; write=719.931 s, sync=3.832 s, total=723.766 s; sync files=6957, longest=1.349 s, average=0.000 s
2014-06-11 12:35:43.936 MDT [11787] - [] - [] - []: [26867-1] LOG: recovery restart point at 1172/88B75B48
2014-06-11 12:35:43.936 MDT [11787] - [] - [] - []: [26868-1] DETAIL: last completed transaction was at log time 2014-06-11 12:35:49.145232-06
2014-06-11 12:35:44.095 MDT [11787] - [] - [] - []: [26869-1] LOG: restartpoint starting: xlog
2014-06-11 12:47:05.041 MDT [11787] - [] - [] - []: [26870-1] LOG: restartpoint complete: wrote 176758 buffers (16.9%); 0 transaction log file(s) added, 0 removed, 28 recycled; write=670.320 s, sync=10.588 s, total=680.945 s; sync files=7398, longest=3.899 s, average=0.001 s
2014-06-11 12:47:05.041 MDT [11787] - [] - [] - []: [26871-1] LOG: recovery restart point at 1172/BE5F4928
2014-06-11 12:47:05.041 MDT [11787] - [] - [] - []: [26872-1] DETAIL: last completed transaction was at log time 2014-06-11 12:47:10.216659-06
2014-06-11 12:47:05.224 MDT [11787] - [] - [] - []: [26873-1] LOG: restartpoint starting: xlog
2014-06-11 12:58:27.067 MDT [11787] - [] - [] - []: [26874-1] LOG: restartpoint complete: wrote 495857 buffers (47.3%); 0 transaction log file(s) added, 0 removed, 54 recycled; write=670.682 s, sync=11.148 s, total=681.843 s; sync files=5627, longest=4.773 s, average=0.001 s
What could cause this issue? I suspect one possibility could be the WAL file itself was corrupted when it was archived.
Is it possible to recover/fix WAL file?
Radovan Jablonovsky | SaaS DBA | Phone 1-403-262-6519 (ext. 256) | Fax 1-403-233-8046
Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us
We are hiring! | search jobs