Greetings, * Christopher Pereira (kripper@xxxxxxxxxxxx) wrote: > Ok, we want to use pgbackrest to *rebuild a standby that has fallen behind* > (where pg_rewind won't work). After reading the docs, we believe we should > use this setup: > > a) Primary host: primary cluster > > b) Repository host: needed for rebuilding the standby (and having PITR as > bonus). > > c) Standby host: standby cluster That would work. > 1) The standby will use streaming replication and will be in sync until > someday something funny happens and both standby and repository get out of > sync with the primary. Having failures in both the standby and repo would have to be something very 'funny' indeed.. > Now, to rebuild the standby first we will have to create a new backup > transferring the data from *primary -> repository*, right? If the repo is entirely gone *and* the standby server is broken, then, yes, you'd have to do a new backup into the repo and then restore that. > Wouldn't this also have a load impact on the primary cluster? Yes. The distinction I was trying to make earlier is that if the standby server gets messed up in any shape, and the repo/repo-server is still operating correctly, then you can perform a pgbackrest delta restore whereby the standby is rebuilt from the last backup that was performed. That process only involves the repo server and the standby server, there wouldn't be any load on the primary, and further, only those files which are different on the standby server vs. the last backup in the repo would be copied, minimizing bandwidth between the two. > 2) In the user guide section 17.3 is explained how to create a "pg-standby > host" to replicate the data *from the repository host*. > And in section 17.4 is explained how to setup Streaming Replication to > replicate the data *from the primary host*. > Do 17.3 and 17.4 work together so that the data is *replicated from the > repository* and then *streamed from the primary*? It's more of an either-or. That is, if you have a pgbackrest repo, and you configure PostgreSQL to have a restore_command which will fetch WAL from the repo *and* you configure PostgreSQL to have a primary_conninfo line to connect to an upstream primary and stream WAL, then you give PG two ways to get the WAL and it'll try one and then the other. If either are able to provide the necessary WAL, then the replica will replay the WAL until it gets to the end of all available WAL, at which point it should be able to connect to the primary and stream WAL. Should the replica ever fall behind and the primary no longer has the WAL that the replica needs, it'll automatically go to the repo to fetch that WAL. Only if the WAL is no longer available from either the primary or the WAL repo is the replica so far behind that it would need a delta restore from a backup to be caught back up. > 3) Before being able to rebuild the standby cluster, would we first need to > update the backup on the repository (backup from primary -> repository) in > order for streaming replication to work (from primary -> standby)? This isn't required, no. Doing so, however, perhaps using a pgbackrest incremental backup, would reduce the amount of WAL the replica would need to replay to get caught back up with the primary though. > 4) Once the backup on the repository is ready, what are the chances that > streaming replication from primary to standby won't work because they got > out of sync again? That really shouldn't be possible. I'm very curious as to exactly what happened that resulted in your primary/replica being 'out of sync', as you say. > 5) Could we just work with 2 hosts (primary and standby) instead of 3? > FAQ section 8 says the repository shouldn't be on the same host as the > standby and having it on the primary doesn't make much sense because if the > primary host is down we won't have access to the backup. It's possible but it's really not recommended, particularly if the idea is to have the primary/replica providing high availability. When you are setting up such an HA solution, you *really* want to have your configurations and such be symmetric between the two (and pgbackrest makes this easy for you, when you have a repo host, by allowing you to configure both in the pgbackrest config file and pgbackrest will figure out which is primary and which is the replica, and perform the backup from whichever one you'd prefer to). In the 2 node configuration you're contemplating, if the replia is down, the primary won't be able to ship WAL off of the system, resulting in a WAL buildup on the primary which could lead to a database failure due to running out of disk space, and further increases risk that the primary then fails and you have lost both availability *and* whatever had been still on the primary, potentially going back hours or more. > It would be ideal to have the repository on the standby host and taking good > care of the configurations. What exactly should be cared of for this setup > to be safe? No, that's really not an ideal configuration. In such a situation, there's no way to make the configurations symmetric, so what happens in a failover? Do you move the repo over to the old primary once you have things back up? That seems unlikely to make sense, so you're left having to potentially rebuild the old primary and then flip back over to it, while you're in a pretty rough state regarding your availability as all of that will take time during which you're at a high risk of the old standby/now primary going down. > I'm afraid I'm not understanding very well the pgbackrest design or how to > use it efficiently to rebuild a standby cluster that got out of sync. Note that you don't actually have to have another server to be the repo host- you can backup to s3 or anything which supports s3 (as long as it's sufficiently compatible... not all are, but they're getting better) with pgbackrest, and we're adding support for Azure and GCS before too much longer (looks likely that we'll have Azure support in the June release, GCS at some point after that). Thanks, Stephen
Attachment:
signature.asc
Description: PGP signature