From: johnlumby@xxxxxxxxxxx To: pgsql-general-owner@xxxxxxxxxxxxxx Subject: how to switch old replication Master to new Standby after promoting old Standby Date: Mon, 14 Mar 2016 13:23:29 -0400 In a scenario involving replication where no failure occurs but I want to interchange Master <->Standby, *and* want to avoid making another full base backup and rsyncing it across, I have found it easy to promote old Standby to new Master with pg_ctl promote, but very difficult to restart the old Primary as a new Standby *without* performing new base backup. Assume current 9.5.1 and using streaming replication with a named replication slot if relevant Second assumption - I am able to temporarily prevent any relational updates to the database before I start the switchover e.g. the default_transaction_read_only = on setting together with being able to control what transactions do if anything So I reach a point where both systems have postgresql running without any replication, both have identical content in all databases (that I can control), and I am willing to tolerate short restarts if need be and also to scp/rsync the contents of pg_xlog and other small files but *not* the entire cluster directory or any database base directories. What do I do next? Here is what I have found seems to work but I am not sure it is robust: 1. shut down both new Master and intended-to-be-new-Standby 2. on intended-to-be-new-Standby, remove the entire content of pg_xlog and the global/pg_control 3. from new Master , tar + scp the entire content of pg_xlog and the global/pg_control to intended-to-be-new-Standby 4. create intended-to-be-new-Standby's recovery.conf, specifying recovery_target_timeline = 'latest' (but I think it works with this setting omitted) 5. start new Master 6. start new Standby and up it comes in its postgres log(**) I see LOG: 00000: database system was shut down at 2016-03-14 16:41:24 GMT LOCATION: StartupXLOG, xlog.c:5936 LOG: 00000: entering standby mode LOCATION: StartupXLOG, xlog.c:6016 LOG: 00000: consistent recovery state reached at C/480000D0 LOCATION: CheckRecoveryConsistency, xlog.c:7493 LOG: 00000: invalid record length at C/480000D0 LOCATION: ReadRecord, xlog.c:3960 LOG: 00000: started streaming WAL from primary at C/48000000 on timeline 3 LOCATION: WalReceiverMain, walreceiver.c:358 LOG: 00000: redo starts at C/480000D0 LOCATION: StartupXLOG, xlog.c:6700 Is the invalid record length msg anything to worry about? But this method is purely empirical. Is it robust? Anyone have any better recommendations? Cheers, John Lumby (**Note this log was from a pre-release 9.5, 9.5alpha2 I don't have 9.5.1 to hand at present -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general