Greetings, * Christopher Pereira (kripper@xxxxxxxxxxxx) wrote: > On 21-May-20 08:43, Stephen Frost wrote: > >* Christopher Pereira (kripper@xxxxxxxxxxxx) wrote: > >>Is there some way to rebuild the standby cluster by doing a differential > >>backup of the primary cluster directly? > >We've contemplated adding support for something like this to pgbackrest, > >since all the pieces are there, but there hasn't been a lot of demand > >for it and it kind of goes against the idea of having a proper backup > >solution, really.. It'd also create quite a bit of load on the primary > >to checksum all the files to do the comparison against what's on the > >replica that you're trying to update, so not something you'd probably > >want to do a lot more than necessary. > > We have backups of the whole server and only need a efficient way to rebuild > the hot-standby cluster when pg_rewind is not able to do so. Personally, I find myself more confident in what pgbackrest does to remaster a former primary (using a delta restore), but a lot of that really comes down to the question of: why did the primary fail? If you don't know that, I really wouldn't recommend using pg_rewind. > I agree with your concerns about the increased load on the primary server, > but this rebuilding process would only be done in case of emergency or > during low load hours. > > pg_basebackup works fine but does not support differential/incremental > backups which is a blocker. pg_basebackup is missing an awful lot of other things- managing of backup rotation, WAL expiration, the ability to parallelize, encryption support, ability to push backups/fetch backups to/from cloud storage solutions, ability to resume from failed backups, delta restore (which is more-or-less what you're asking for), parallel archiving/fetching of WAL.. > Do you know any alternative software that is able to rebuild the standby PG > data dir using rsync or similar while the primary is still online? pgbackrest can certainly rebuild the standby, if you're using it for backups, and do so very quickly thanks to delta restore and it's parallelism. I'm not aware of anything that does exactly what you're looking for. > It seems a simple pg_start_backup + rsync + pg_stop_backup (maybe combined > with a LVM snapshot) would do, but we would prefer to use some existing > tool. I'd strongly recommend that you use an existing tool, there's an awful lot of complications and you absolutely can *not* use rsync for that unless you are doing it with checksums enabled, and even then it's complicated- you probably don't want to sync across unlogged tables but it's not easy to exclude those, or temp files/tables, you have to make sure to manage the WAL properly, ensure that the appropriate information makes it into the backup_label (you shouldn't be using exclusive backup because a reboot of the primary at the wrong time will result in PG not starting up on the primary...), etc, etc. > We just tried barman, but it also seems to require a restore from the backup > before being able to start the standby server (?), and we are afraid this > would require double storage, IO and time for rebuilding the standby > cluster. I really think you should reconsider whatever backup solution you're using today and rather than keeping it independent, make it part of the solution to rebuilding replicas. Maybe it isn't clear, so I'll try to explain- pgbackrest, if you use it for your backups, will be able to restore over top of an existing PG cluster, updating only those files which are different from what's in the backup (based on checksums that it calculates), and is able to do so in parallel, and then you can replay WAL from your pgbackrest repo, right up until the replica is able to reconnect to the primary and resume replaying WAL. It's a pretty common approach and is supported by HA solutions like patroni. Thanks, Stephen
Attachment:
signature.asc
Description: PGP signature