On Mon, 2010-07-12 at 08:58 +0200, Thomas Kellerer wrote: > Greg Smith, 10.07.2010 14:44: > >> Is there a difference in how much data could potentially be lost in > >> case of a failover? E.g. because 9.0 replicates the changes quicker than 8.4? > > > > There's nothing that 9.0 does that you can' t do with 8.4 and the right > > software to aggressively ship partial files around. In practice though, > > streaming shipping is likely to result in less average data loss simply > > because it will do the right thing to ship new transactions > > automatically. Getting the same reaction time and resulting low amount > > of lag out of an earlier version requires a level of external script > > configuration that few sites every actually manage to obtain. You can > > think of the 9.0 features as mainly reducing the complexity of > > installation needed to achieve low latency significantly. I would bet > > that if you tried to setup 8.4 to achieve the same quality level in > > terms of quick replication, your result would be more fragile and buggy > > than just using 9.0--the bugs would be just be in your own code rather > > than in the core server. > > > > Greg and Rob, > > thanks for the answers. > > I didn't "plan" (or expect) to get the same level of reliability from a "standard" 8.4 HA installation, so I don't think I would go that way. If we do need that level, we'd go for 9.0 or for some other solution. > > The manual lists three possible solutions to HA: shared disk failover, file system replication and Warm/Hot Standby. I'm not an admin (nor a DBA), so my question might sound a bit stupid: from my point of view solutions using shared disk failover of file system replication seem to be more reliable in terms of how much data can get lost (and possibly the switch over lag) With Shared Disk failover, you don't use filesystem replication. Your disk resources are available to a secondary server, and in the result of a failure to the primary server, your secondary takes ownership of the disk resources. The nice thing about shared disk solutions is that you won't lose any committed data if a server fails. The down sides are that this shared disk can be really tough to setup properly. Your storage is a still a single point of failure, so you need to make sure that it's reliable and most likely still use alternate means to protect against failure of the storage. Warm/Hot Standby is a lot easier to setup, but there is a window for data loss on failure. This can be minimized/eliminated by using some sort of block level synchronous replication (DRBD file system, array or SAN based) if you can afford the overhead. I don't have any first hand experience with the sync based stuff, so I can't comment much further than that. Switchover times are really going to vary. For shared clusters, there is some overhead in dealing with the low level disk stuff, but I find it's not that bad. The bigger issue on switchover is whether or not you have time to call a fast shutdown instead of having the server do a hard crash. If it's a hard crash (which it usually is), you'll start up in recovery mode on the secondary server and have to replay through wal. If you have a lot of wal files you need to replay on start up, the switchover time can be quite long. Warm/Hot Standby tends to be faster on fail over as long as you are applying the wal files at a reasonable rate. One further thing to mention - all of these solutions are based on making the physical blocks available (actually, I'm not sure about Streaming replication in 9.0). As such, it is possible for corruption to hit the master at the block level and get replicated through the chain. Logical solutions like Slony/Bucardo/Londiste do give some additional protection against this. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin