Re: postgres in HA constellation

Brad Nicholson <bnichols@xxxxxxxxxxxxxxx> · Fri, 13 Oct 2006 11:52:14 -0400

On Wed, 2006-10-11 at 16:12 -0500, Jim C. Nasby wrote:
> On Wed, Oct 11, 2006 at 10:28:44AM -0400, Andrew Sullivan wrote:
> > On Thu, Oct 05, 2006 at 08:43:21PM -0500, Jim Nasby wrote:
> > > Isn't it entirely possible that if the master gets trashed it would  
> > > start sending garbage to the Slony slave as well?
> > 
> > Well, maybe, but unlikely.  What happens in a shared-disc failover is
> > that the second machine re-mounts the same partition as the old
> > machine had open.  The risk is the case where your to-be-removed
> > machine hasn't actually stopped writing on the partition yet, but
> > your failover software thinks it's dead, and can fail over.  Two
> > processes have the same Postgres data and WAL files mounted at the
> > same time, and blammo.  As nearly as I can tell, it takes
> > approximately zero time for this arrangement to make such a mess that
> > you're not committing any transactions.  Slony will only get the data
> > on COMMIT, so the risk is very small.
>  
> Hrm... I guess it depends on how quickly the Slony master would stop
> processing if it was talking to a shared-disk that had become corrupt
> from another postmaster.

That doesn't depend on Slony, it depends on Postgres.  If transactions
are committing on the master, Slony will replicate them.  You could have
a situation where your HA failover trashes some of you database, but the
database still starts up.  It starts accepting and replicating
transactions before the corruption is discovered.

Brad.