On Wed, Oct 11, 2006 at 10:28:44AM -0400, Andrew Sullivan wrote: > On Thu, Oct 05, 2006 at 08:43:21PM -0500, Jim Nasby wrote: > > Isn't it entirely possible that if the master gets trashed it would > > start sending garbage to the Slony slave as well? > > Well, maybe, but unlikely. What happens in a shared-disc failover is > that the second machine re-mounts the same partition as the old > machine had open. The risk is the case where your to-be-removed > machine hasn't actually stopped writing on the partition yet, but > your failover software thinks it's dead, and can fail over. Two > processes have the same Postgres data and WAL files mounted at the > same time, and blammo. As nearly as I can tell, it takes > approximately zero time for this arrangement to make such a mess that > you're not committing any transactions. Slony will only get the data > on COMMIT, so the risk is very small. Hrm... I guess it depends on how quickly the Slony master would stop processing if it was talking to a shared-disk that had become corrupt from another postmaster. > > I think PITR would be a much better option to protect against this, > > since you could probably recover up to the exact point of failover. > > That oughta work too, except that your remounted WAL gets corrupted > under the imagined scenario, and then you copy the next updates to > the WAL. So you have to save all the incremental copies of the WAL > you make, so that you don't have a garbage file to read. > > As I said, I don't think that it's a bad idea to use this sort of > trick. I just think it's a poor single line of defence, because when > it fails, it fails hard. Yeah, STONITH is *critical* for shared-disk. -- Jim Nasby jim@xxxxxxxxx EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)