Re: pg_basebackup + incremental base backups

Stephen Frost <sfrost@xxxxxxxxxxx> · Sun, 24 May 2020 15:48:33 -0400

Greetings,

* Christopher Pereira (kripper@xxxxxxxxxxxx) wrote:
> Ok, we want to use pgbackrest to *rebuild a standby that has fallen behind*
> (where pg_rewind won't work). After reading the docs, we believe we should
> use this setup:
> 
> a) Primary host: primary cluster
> 
> b) Repository host: needed for rebuilding the standby (and having PITR as
> bonus).
> 
> c) Standby host: standby cluster

That would work.

> 1) The standby will use streaming replication and will be in sync until
> someday something funny happens and both standby and repository get out of
> sync with the primary.

Having failures in both the standby and repo would have to be something
very 'funny' indeed..

> Now, to rebuild the standby first we will have to create a new backup
> transferring the data from *primary -> repository*, right?

If the repo is entirely gone *and* the standby server is broken, then,
yes, you'd have to do a new backup into the repo and then restore that.

> Wouldn't this also have a load impact on the primary cluster?

Yes.

The distinction I was trying to make earlier is that if the standby
server gets messed up in any shape, and the repo/repo-server is still
operating correctly, then you can perform a pgbackrest delta restore
whereby the standby is rebuilt from the last backup that was performed.
That process only involves the repo server and the standby server, there
wouldn't be any load on the primary, and further, only those files which
are different on the standby server vs. the last backup in the repo
would be copied, minimizing bandwidth between the two.

> 2) In the user guide section 17.3 is explained how to create a "pg-standby
> host" to replicate the data *from the repository host*.
> And in section 17.4 is explained how to setup Streaming Replication to
> replicate the data *from the primary host*.
> Do 17.3 and 17.4 work together so that the data is *replicated from the
> repository* and then *streamed from the primary*?

It's more of an either-or.  That is, if you have a pgbackrest repo, and
you configure PostgreSQL to have a restore_command which will fetch WAL
from the repo *and* you configure PostgreSQL to have a primary_conninfo
line to connect to an upstream primary and stream WAL, then you give PG
two ways to get the WAL and it'll try one and then the other.  If either
are able to provide the necessary WAL, then the replica will replay the
WAL until it gets to the end of all available WAL, at which point it
should be able to connect to the primary and stream WAL.

Should the replica ever fall behind and the primary no longer has the
WAL that the replica needs, it'll automatically go to the repo to fetch
that WAL.  Only if the WAL is no longer available from either the
primary or the WAL repo is the replica so far behind that it would need
a delta restore from a backup to be caught back up.

> 3) Before being able to rebuild the standby cluster, would we first need to
> update the backup on the repository (backup from primary -> repository) in
> order for streaming replication to work (from primary -> standby)?

This isn't required, no.  Doing so, however, perhaps using a pgbackrest
incremental backup, would reduce the amount of WAL the replica would
need to replay to get caught back up with the primary though.

> 4) Once the backup on the repository is ready, what are the chances that
> streaming replication from primary to standby won't work because they got
> out of sync again?

That really shouldn't be possible.  I'm very curious as to exactly what
happened that resulted in your primary/replica being 'out of sync', as
you say.

> 5) Could we just work with 2 hosts (primary and standby) instead of 3?
> FAQ section 8 says the repository shouldn't be on the same host as the
> standby and having it on the primary doesn't make much sense because if the
> primary host is down we won't have access to the backup.

It's possible but it's really not recommended, particularly if the idea
is to have the primary/replica providing high availability.  When you
are setting up such an HA solution, you *really* want to have your
configurations and such be symmetric between the two (and pgbackrest
makes this easy for you, when you have a repo host, by allowing you to
configure both in the pgbackrest config file and pgbackrest will figure
out which is primary and which is the replica, and perform the backup
from whichever one you'd prefer to).

In the 2 node configuration you're contemplating, if the replia is down,
the primary won't be able to ship WAL off of the system, resulting in a
WAL buildup on the primary which could lead to a database failure due to
running out of disk space, and further increases risk that the primary
then fails and you have lost both availability *and* whatever had been
still on the primary, potentially going back hours or more.

> It would be ideal to have the repository on the standby host and taking good
> care of the configurations. What exactly should be cared of for this setup
> to be safe?

No, that's really not an ideal configuration.  In such a situation,
there's no way to make the configurations symmetric, so what happens in
a failover?  Do you move the repo over to the old primary once you have
things back up?  That seems unlikely to make sense, so you're left
having to potentially rebuild the old primary and then flip back over to
it, while you're in a pretty rough state regarding your availability as
all of that will take time during which you're at a high risk of the old
standby/now primary going down.

> I'm afraid I'm not understanding very well the pgbackrest design or how to
> use it efficiently to rebuild a standby cluster that got out of sync.

Note that you don't actually have to have another server to be the repo
host- you can backup to s3 or anything which supports s3 (as long as
it's sufficiently compatible... not all are, but they're getting
better) with pgbackrest, and we're adding support for Azure and GCS
before too much longer (looks likely that we'll have Azure support in
the June release, GCS at some point after that).

Thanks,

Stephen
Attachment:
signature.asc

Description: PGP signature