I am working with PostgreSQL 9.1.3 – I setup a master and standby – Initiated replication and verified that it was occurring – Failed over from master to standby and verified that the database could be updated on the new master – I then configured the former standby as a master, the former master as a standby and used the following command from the new master to transfer data to the new standby: rsync -av --exclude postgresql.conf --exclude pg_xlog /var/lib/pgsql/9.1/data/* 192.7.143.213:/var/lib/pgsql/9.1/data (with all previous data being removed by rm –rf *) I then started the new standby first and after a short wait the new master – The standby starts and initially has connect errors but after the master is started the standby appears to start streaming replication and then errors with a timeline error – On the new master all looks good and the database can be accessed via psql. Here is the content of my recovery.conf file: standby_mode=on primary_conninfo='host=192.7.143.111 port=5432 user=ruser password=ruserpass' trigger_file='/var/lib/pgsql/9.1/data/failover' recovery_target_timeline = 'latest' (same effect when this line is removed) Before New Master Starts: 2012-07-11 10:43:04.476 EDT---10876-LOCATION: libpqrcv_connect, libpqwalreceiver.c:102 2012-07-11 10:43:09.476 EDT---10877-FATAL: XX000: could not connect to the primary server: could not connect to server: Connection refused Is the server running on host "192.7.143.111" and accepting TCP/IP connections on port 5432? 2012-07-11 10:43:09.476 EDT---10877-LOCATION: libpqrcv_connect, libpqwalreceiver.c:102 2012-07-11 10:43:14.476 EDT---10878-FATAL: XX000: could not connect to the primary server: could not connect to server: Connection refused Is the server running on host "192.7.143.111" and accepting TCP/IP connections on port 5432? After new Master Starts: 2012-07-11 10:43:14.476 EDT---10878-LOCATION: libpqrcv_connect, libpqwalreceiver.c:102 2012-07-11 10:43:19.479 EDT---10879-LOG: 00000: streaming replication successfully connected to primary 2012-07-11 10:43:19.479 EDT---10879-LOCATION: libpqrcv_connect, libpqwalreceiver.c:171 2012-07-11 10:43:20.749 EDT---10738-LOG: 00000: unexpected timeline ID 1 in log file 0, segment 2, offset 0 2012-07-11 10:43:20.749 EDT---10738-LOCATION: ValidXLOGHeader, xlog.c:4123 2012-07-11 10:43:20.749 EDT---10879-FATAL: 57P01: terminating walreceiver process due to administrator command 2012-07-11 10:43:20.749 EDT---10879-LOCATION: ProcessWalRcvInterrupts, walreceiver.c:150 2012-07-11 10:43:20.849 EDT---10738-LOG: 00000: unexpected timeline ID 1 in log file 0, segment 2, offset 0 2012-07-11 10:43:20.849 EDT---10738-LOCATION: ValidXLOGHeader, xlog.c:4123 2012-07-11 10:43:24.849 EDT---10738-LOG: 00000: unexpected timeline ID 1 in log file 0, segment 2, offset 0 2012-07-11 10:43:24.849 EDT---10738-LOCATION: ValidXLOGHeader, xlog.c:4123 2012-07-11 10:43:29.849 EDT---10738-LOG: 00000: unexpected timeline ID 1 in log file 0, segment 2, offset 0 2012-07-11 10:43:29.849 EDT---10738-LOCATION: ValidXLOGHeader, xlog.c:4123 2012-07-11 10:43:34.850 EDT---10738-LOG: 00000: unexpected timeline ID 1 in log file 0, segment 2, offset 0 2012-07-11 10:43:34.851 EDT---10738-LOCATION: ValidXLOGHeader, xlog.c:4123 2012-07-11 10:43:38.415 EDT---10731-LOG: 00000: received fast shutdown request 2012-07-11 10:43:38.415 EDT---10731-LOCATION: pmdie, postmaster.c:2251 2012-07-11 10:43:38.415 EDT---10738-LOG: 00000: unexpected timeline ID 1 in log file 0, segment 2, offset 0 2012-07-11 10:43:38.415 EDT---10738-LOCATION: ValidXLOGHeader, xlog.c:4123 2012-07-11 10:43:38.416 EDT---10731-LOG: 00000: startup process (PID 10738) exited with exit code 1 2012-07-11 10:43:38.416 EDT---10731-LOCATION: LogChildExit, postmaster.c:2867 2012-07-11 10:43:38.416 EDT---10731-LOG: 00000: aborting startup due to startup process failure 2012-07-11 10:43:38.416 EDT---10731-LOCATION: reaper, postmaster.c:2377 Al Gregorio "When you're moving in the positive, your destination is the brightest star." Stevie Wonder |