I'm going through all my usual steps for setting up streaming replication on a new pair of servers. Modify configs as appropriate, rsync data from master to slave, etc. I have this all automated with chef, and it has been pretty bulletproof for awhile. However, today, I ran into this when starting the slave on this new pair:
* Starting PostgreSQL 9.2 database server * The PostgreSQL server failed to start. Please check the log output:
2013-08-08 23:47:30 GMT LOG: database system was interrupted; last known up at 2013-08-08 23:22:40 GMT
2013-08-08 23:47:30 GMT LOG: entering standby mode
2013-08-08 23:47:30 GMT LOG: WAL file is from different database system
2013-08-08 23:47:30 GMT DETAIL: WAL file database system identifier is 5909892614333033983, pg_control database system identifier is 5909892824786287231.
2013-08-08 23:47:30 GMT LOG: invalid primary checkpoint record
2013-08-08 23:47:30 GMT LOG: invalid secondary checkpoint record
2013-08-08 23:47:30 GMT PANIC: could not locate a valid checkpoint record
2013-08-08 23:47:30 GMT LOG: startup process (PID 10600) was terminated by signal 6: Aborted
2013-08-08 23:47:30 GMT LOG: aborting startup due to startup process failure
And I've been stumped. I've completely nuked my data dirs and started over and gotten the same result, but with different identifier numbers (as I would expect).* Starting PostgreSQL 9.2 database server * The PostgreSQL server failed to start. Please check the log output:
2013-08-08 23:47:30 GMT LOG: database system was interrupted; last known up at 2013-08-08 23:22:40 GMT
2013-08-08 23:47:30 GMT LOG: entering standby mode
2013-08-08 23:47:30 GMT LOG: WAL file is from different database system
2013-08-08 23:47:30 GMT DETAIL: WAL file database system identifier is 5909892614333033983, pg_control database system identifier is 5909892824786287231.
2013-08-08 23:47:30 GMT LOG: invalid primary checkpoint record
2013-08-08 23:47:30 GMT LOG: invalid secondary checkpoint record
2013-08-08 23:47:30 GMT PANIC: could not locate a valid checkpoint record
2013-08-08 23:47:30 GMT LOG: startup process (PID 10600) was terminated by signal 6: Aborted
2013-08-08 23:47:30 GMT LOG: aborting startup due to startup process failure
QH