Re: pgbackrest - question about restoring cluster to a new cluster on same server

David Steele <david@xxxxxxxxxxxxx> · Wed, 18 Sep 2019 21:58:11 -0400

On 9/18/19 9:40 PM, Ron wrote:
> 
> I'm concerned with one pgbackrest process stepping over another one and
> the restore (or the "pg_ctl start" recovery phase) accidentally
> corrupting the production database by writing WAL files to the original
> cluster.

This is not an issue unless you seriously game the system.  When a
cluster is promoted it selects a new timeline and all WAL will be
archived to the repo on that new timeline.  It's possible to promote a
cluster without a timeline switch by tricking it but this is obviously a
bad idea.

So, if you promote the new cluster and forget to disable archive_command
there will be no conflict because the clusters will be generating WAL on
separate timelines.

In the case of a future failover a higher timeline will be selected so
there still won't be a conflict.

Unfortunately, that dead WAL from the rogue cluster will persist in the
repo until an PostgreSQL upgrade because expire doesn't know when it can
be removed since it has no context.  We're not quite sure how to handle
this but it seems a relatively minor issue, at least as far as
consistency is concerned.

If you do have a split-brain situation where two primaries are archiving
on the same timeline then first-in wins.  WAL from the losing primary
will be rejected.

Regards,
-- 
-David
david@xxxxxxxxxxxxx