In the last exciting episode, dpage@xxxxxxxxxxxxxxxxxx ("Dave Page") wrote: > If I'm honest, I think your boss is going to be disappointed. You > would add a *lot* of complexity to the system to make it handle > failures with zero intervention, and that extra complexity is > probably more likely to go wrong than a single server. I'd spend > your time and money on making sure your raid & ups are good, that > you are running on server grade hardware with ECC RAM, and that you > have good out of band management facilities so even if you are away > from the office you can connect via VPN/modem or whatever and fix > things. We have found something of the same thing with trying to get improved reliability out of HACMP (an IBM product that automatically fails over applications between servers). We had previously experienced too-frequent problems due to lack of reliability of our servers. (Sun high end stuff, as it happened...) Moving to HACMP on AIX, well, the IBM AIX servers have been way more reliable. Unfortunately, HACMP is all too fragile. It has a lot of "moving parts" (instances of the "extra complexity" that Dave mentioned), and apparently you have to have enough outages to upgrade components to keep it reliable that it rather undermines the uptime. My suspicion (not actually confirmable with real numbers; all I can do is hand-wave) is that if we had spent the costs put into HACMP on otherwise beefing up the Golden Servers, we'd probably have had better reliability out of depending on the individual boxes to be reliable. In any case, whatever you use for this, whether Slony-I, with "automatic failover" scripts, or some sort of "heartbeating/server takeover" scheme, will suffer from the "too many complex components" problem. A vital problem is that it's really hard to validate that the production configuration is correct. If you made a mistake, it'll all blow up. And you don't want to run tests that might blow everything up, do you? :-) -- let name="cbbrowne" and tld="gmail.com" in name ^ "@" ^ tld;; http://linuxdatabases.info/info/linuxdistributions.html "Now, if someone proposed using people who spam comp.sys.* groups with political screeds in place of lab rats for drug testing, I'd wholeheartedly concur". -- John C. Randolph