On Tue, Mar 8, 2016 at 1:48 PM, CS DBA <cs_dba@xxxxxxxxxxxxxxxxxxx> wrote:
I do however have a few questions related to this, I'm interested to find out what others have done here, in particular how do you go about moving end users (assuming a web app is the end user entry point) to point seamlessly to the secondary site? Also how have you all dealt with the possible split brain issue (i.e. we fail over, then the primary site comes back up and existing/old connections to the old site then write to the old master)
While not seamlessly, you can achieve a pretty good failover rate by using DNS servers with short TTL (under 2 min). On failure, have your monitoring tool fire the failover scripts (promote postgres server, enable app server, etc.) and then change the apps DNS record with the secondary site IP address. In very short time you should have your users working on the secondary site.
Cloudflare or Amazon's Route 56 can provide the DNS capability. It is simple, reliable and cheap.
Once the primary site is back, split brain shouldn't be a problem since your DNS will keep forwarding traffic to your secondary site till you intervene to switch back.
Or... you can go with BGP and let the network team do the dirty work at the routing level. With BGP you should also expect somewhere between 10 and 120 seconds downtime till the route changes propagate.
Cheers,
Fernando.