On Thu, Oct 7, 2010 at 12:27 AM, Andy <angelflow@xxxxxxxxx> wrote: > Is there any tutorials or detailed instructions on how to set up HA postgresql & failover? The documentation (http://www.postgresql.org/docs/9.0/interactive/warm-standby-failover.html) on this topics is pretty scarce. > > The scenario I'm most interested in is this: > > 2 servers - a master and a hot standby. All writes are sent to master, reads are split between master and hot standby. To have true redundancy, you need 3 servers. Just saying. Otherwise when one goes down, no more redundancy. > 1) If the hot standby goes down, how do I redirect reads to the master? Have a config file for your app that tells it where to go for reads and writes. Change the config file to point reads at a different db if a read slave fails. What constitutes a failed read slave is kind of a business decision, so you'll likely have to write your own code to decide what being down means. > 2) If the master fails > -how do I automatically promote the standby to master and send all reads/writes to the new master? First you need to decide if you actually want automated failovers. I've seen automated failovers cause as many problems as they were supposed to fix, but it can be done. Keep in mind that on a two db system, failing over means you lose redundancy. If your cluster fails over on a lot of false positives, that's a lot of time with no redundancy. If your script isn't written with having only one node in mind, it might try to failover a second time with no read slave to promote to master. Also, you're going to have to come up with what constitutes a failed master. 30 seconds non-responsive? 5 minutes? An hour? If the problem is that the write master is simply overloaded, then failing over isn't gonna solve anything, as the now newly promoted master is going to collapse as well under even heavier load. It might have been better to adjust the load factors used to determine where read queries go to take load off of the master, or to change a setting in your app that reduces load on the master. With an overloaded write master, then failover, then overloaded even worse new write master you've got a site down, no redundancy, and you need to rebuild your old master as a read slave to handle the load. To start with I do not recommend doing automatic failovers. Have a system in place where your DBA / SA can promote a slave to master in one or two easy steps, and if / when the master truly fails, then run that script. A human can make that decision with far more care than a piece of code. > -what happens when the old master comes back up? Do I need to so anything to make it catches up to the new master? You can't let the old master come back up as thinking it's the master as well. You have to re-establish replication to it as a slave. Again, this is usually not automated, at least not at first. The old master needs to be "shot in the head" so to speak before it comes back up, or your app may start writing to it instead of or as well as the new master, and now you've got split-brain problems. In short automated failover is complicated to get right, and if you get it wrong the cost of the consequences can far worse than the 5 or 10 minutes of downtime required for a manual switch-over. First write scripts that automate most of the task for your application and db farm. Test those scripts as much as you can on a test farm. Then run them when needed by hand when things go wrong. If or when you're certain you've got all the bugs worked out and all the possible failure scenarios worked out, you can start testing automated failover. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general