We've contemplated adding support for something like this to pgbackrest, since all the pieces are there, but there hasn't been a lot of demand for it and it kind of goes against the idea of having a proper backup solution, really.. It'd also create quite a bit of load on the primary to checksum all the files to do the comparison against what's on the replica that you're trying to update, so not something you'd probably want to do a lot more than necessary.
Ok, we want to use pgbackrest to rebuild a standby that has
fallen behind (where pg_rewind won't work). After reading
the docs, we believe we should use this setup:
a) Primary host: primary cluster
b) Repository host: needed for rebuilding the standby (and having PITR as bonus).
c) Standby host: standby cluster
Some questions:
1) The standby will use streaming replication and will be in sync
until someday something funny happens and both standby and
repository get out of sync with the primary.
Now, to rebuild the standby first we will have to create a new
backup transferring the data from primary -> repository,
right?
Wouldn't this also have a load impact on the primary cluster?
2) In the user guide section 17.3 is explained how to create a
"pg-standby host" to replicate the data from the repository
host.
And in section 17.4 is explained how to setup Streaming
Replication to replicate the data from the primary host.
Do 17.3 and 17.4 work together so that the data is replicated
from the repository and then streamed from the primary?
3) Before being able to rebuild the standby cluster, would we first need to update the backup on the repository (backup from primary -> repository) in order for streaming replication to work (from primary -> standby)?
4) Once the backup on the repository is ready, what are the chances that streaming replication from primary to standby won't work because they got out of sync again?
5) Could we just work with 2 hosts (primary and standby) instead
of 3?
FAQ section 8 says the repository shouldn't be on the same host as
the standby and having it on the primary doesn't make much sense
because if the primary host is down we won't have access to the
backup.
It would be ideal to have the repository on the standby host and
taking good care of the configurations. What exactly should be
cared of for this setup to be safe?
I'm afraid I'm not understanding very well the pgbackrest design or how to use it efficiently to rebuild a standby cluster that got out of sync.