Re: pgbackrest - question about restoring cluster to a new cluster on same server

Jerry Sievers <gsievers19@xxxxxxxxxxx> · Wed, 18 Sep 2019 21:18:24 -0500

David Steele <david@xxxxxxxxxxxxx> writes:

> On 9/18/19 9:40 PM, Ron wrote:
>
>> 
>> I'm concerned with one pgbackrest process stepping over another one and
>> the restore (or the "pg_ctl start" recovery phase) accidentally
>> corrupting the production database by writing WAL files to the original
>> cluster.
>
> This is not an issue unless you seriously game the system.  When a

And/or your recovery system is running archive_mode=always :-)

I don't know how popular that setting value is but that plus an
identical archive_command as the origin...  duplicate archival with
whatever consequences.

Disclaimer: I don't know if pgbackrest guards against such a
configuration.

> cluster is promoted it selects a new timeline and all WAL will be
> archived to the repo on that new timeline.  It's possible to promote a
> cluster without a timeline switch by tricking it but this is obviously a
> bad idea.
>
> So, if you promote the new cluster and forget to disable archive_command
> there will be no conflict because the clusters will be generating WAL on
> separate timelines.
>
> In the case of a future failover a higher timeline will be selected so
> there still won't be a conflict.
>
> Unfortunately, that dead WAL from the rogue cluster will persist in the
> repo until an PostgreSQL upgrade because expire doesn't know when it can
> be removed since it has no context.  We're not quite sure how to handle
> this but it seems a relatively minor issue, at least as far as
> consistency is concerned.
>
> If you do have a split-brain situation where two primaries are archiving
> on the same timeline then first-in wins.  WAL from the losing primary
> will be rejected.
>
> Regards,

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@xxxxxxxxxxx