Think I found the problem. These were cloned from another replica, and that replica had .ready files for all of those old WAL files in its pg_xlog/archive_status/ directory. Removing those .ready files (being careful not to remove the files for current WAL files) seemed to fix the problem. I also verified that those old .ready files did NOT exist in the prod master/primary. Looking at the mtime on those files, it coincides with a big server migration that was done last year.
Don.
On Mon, Dec 4, 2017 at 3:44 PM, Don Seiler <don@xxxxxxxxx> wrote:
I'm setting up a test primary/standby setup from two clones of a prod DR standby.The steps were as follows:
- create DB01 and DB02 from the same DR backup
- Let them run through crash recovery.
- Change DB02 to use DB01 as its master for streaming replication
- Open DB01 as a new master, archiving its WALs
This all seems to be fine, except for DB01 trying to archive WALs. It keeps complaining that it can't archive WAL files that, judging from the ID number in the file name are really, really old.In this case, the current WAL file at the time of opening the DB was 000000010000121B00000095. The DB then further created WAL files 000000010000121B00000096 and 000000010000121B00000097. However in the server log I see warnings about being unable to archive WAL file 000000010000000000000001! On a lark, I did a "touch000000010000000000000001" in pg_xlog, which it seemed to archive and then it asked for more older file names. For example these were the next few:
- 000000010000045700000047
- 000000010000046E00000035
- 00000001000004740000007A
- etc
These are nowhere close to the WAL files that we've been processing today or even this past month. Why is it looking to archive these? Is there a way I can tell it to skip/forget these so it can start archiving the current set?
Don Seiler
www.seiler.us
www.seiler.us