On Wed, Nov 11, 2009 at 01:35:58PM -0500, Greg Smith wrote: - David Kerr wrote: - >The apps actually aren't as robust as the DB in this case, so i'll have - >time to - >replay all of the logs that made it before "the big one" while those are - >being - >configured to come up. and if it does take longer that's not a huge issue - >i'll have a few hours to get 100% caught up. - > - It sounds like you've got the basics nailed down here and are on a well - trod path, just one not one documented publicly very well. Since you - said that even DRBD was too much overhead for you, I think a dive into - evaluating the commercial clustering approaches (or the free LinuxHA - that RedHat's is based on, which I haven't been real impressed by) would - be appropriate. The hard part is generally getting a heartbeat between - the two servers sharing the SAN that is both sensitive enough to catch - failures while not being so paranoid that it fails over needlessly (say, - when load spikes on the primary and it slows down). Make sure you test - that part out very carefully with any vendor you evaluate. - - As far as the PostgreSQL specifics go, you need a solid way to ensure - you've disconnected the now defunct master from the SAN (the classic - "shoot the other node in the head" problem). All you *should* have to - do is start the database again on the backup after doing that. That - will come up as a standard crash, run through WAL replay crash recovery, - and the result should be no different than had you restarted after a - crash on the original node. The thing you cannot let happen is allowing - the original master to continue writing to the shared SAN volume once - that transition has happened. Thanks Greg that sounds good! and puts my (and my management's) concerns at ease! Dave -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general