Hi all,
Given a cluster of three database servers running 9.1.3 (master, sync slave, async slave), it seems that there are two ways to promote the sync slave to become master:
1. pg_ctl promote the sync slave (increments timeline counter)
2. remove recovery.conf on the sync slave and pg_ctl restart (does not increment timeline counter)
The sync slave becomes master more quickly using `pg_ctl promote`, but now every server in the cluster has to take a new base backup due to the incremented timeline.
2ndQuadrant's repmgr uses the second option so that the async slave can "follow" the new master, saving you from having to do a new base backup. Additionally, the old master is able to start streaming replication from the new master without a new base backup. (Repmgr does not actually support the latter behavior out of the box, but it seemed to work.)
So, given a hard failure (i.e. power loss) of the master, `pg_ctl promote` provides availability more quickly, but `pg_ctl restart` provides data redundancy more quickly. Is this an accurate assessment of the tradeoffs between the two approaches? I've found talk on the mailings lists surrounding future support for slaves following timelines after a new master completes recovery, but I have been unable to find anything discussing the approach used by repmgr. Are there risks associated with the `pg_ctl restart` approach, or is it safe to use?
Cheers,
Dave