Re: postgres in HA constellation

Andrew Sullivan <ajs@xxxxxxxxxxxxxxx> · Wed, 11 Oct 2006 10:28:44 -0400

On Thu, Oct 05, 2006 at 08:43:21PM -0500, Jim Nasby wrote:
> Isn't it entirely possible that if the master gets trashed it would  
> start sending garbage to the Slony slave as well?

Well, maybe, but unlikely.  What happens in a shared-disc failover is
that the second machine re-mounts the same partition as the old
machine had open.  The risk is the case where your to-be-removed
machine hasn't actually stopped writing on the partition yet, but
your failover software thinks it's dead, and can fail over.  Two
processes have the same Postgres data and WAL files mounted at the
same time, and blammo.  As nearly as I can tell, it takes
approximately zero time for this arrangement to make such a mess that
you're not committing any transactions.  Slony will only get the data
on COMMIT, so the risk is very small.

> I think PITR would be a much better option to protect against this,  
> since you could probably recover up to the exact point of failover.

That oughta work too, except that your remounted WAL gets corrupted
under the imagined scenario, and then you copy the next updates to
the WAL.  So you have to save all the incremental copies of the WAL
you make, so that you don't have a garbage file to read.

As I said, I don't think that it's a bad idea to use this sort of
trick.  I just think it's a poor single line of defence, because when
it fails, it fails hard.

A

-- 
Andrew Sullivan  | ajs@xxxxxxxxxxxxxxx
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism. 
                --Brad Holland