Greetings, * Thorsten Schöning (tschoening@xxxxxxxxxx) wrote: > we are currently implementing low level backups using pg_start_backup > and pg_stop_backup. While the roadmap already contains moving to > pg_basebackup or even barman, I would like to better understand some > aspects of what's in use currently. The first thing one is advised to > do in case of low level backups is the following: I suppose first off, I wouldn't recommend trying to write your own low-level backup tool, especially not as some kind of temporary solution, as it sounds like you're suggesting doing here. What's the issue with using one of the existing tools (of which there's quite a few- pg_basebackup, barman, pgbackrest, wal-g, and more..)? > > 1. Ensure that WAL archiving is enabled and working. > > https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-BASE-BACKUP > > "archive_command" can be an arbitrary shell command copying WAL > segments anywhere and between "pg_start_backup" and "pg_stop_backup" > the docs make clear that arbitrary file system tools can be used as > well to copy things around. While possible to do so, I don't recommend it- as soon as the archive_command returns, PG is free to remove/overwrite the original WAL file, and therefore whatever is in archive_command really needs to guarantee that the WAL file was durably written out (at least fsync'd locally, which something like 'cp' won't do for you, or ideally sent to a remote system and fsync'd there). > So, when "archive_command" is enabled and working before > "pg_start_backup" is executed, one has to deal with the I/O-load of > copying the base files and creating the WAL archives at the same time. The system has to deal with the load (be it I/O or CPU...) of creating the WAL archives and storing them durably during the entire operation of the cluster, if you want point-in-time-recovery. During a base backup, you have the additional load from copying the data files as well, yes. > But those latter archives are only needed before "pg_stop_backup" gets > executed, depending on the given arguments even afterwards in theory. > But before is enough already to save I/O. I'm not following what you're talking about here.. The archives generated during the backup are required when the backup is restored as they're required to get the database back into a consistent state. > So, looking at consistency of the backup, would it be OK to actually > only archive WALs when copying base files has finished? Or is there a > reason the docs put that at the first advise I obviously didn't > understand yet? Typically, point-in-time-recovery (PITR) is a desired part of doing database backups and therefore you need to be archiving every WAL file created. If what you're asking is- can you just have the WAL files saved on the primary during the copying of the data files and then grab them afterwards, then, yes, you can do that (and it's actually exactly what pg_basebackup does in some modes). I haven't heard of that being a requirement before- it was done in pg_basebackup for implementation reasons, as I understand it, and not really because it was a much needed feature. > From my understanding, without actually archving WAL segments, the > WAL would simply grow until archiving gets enabled, without any > influence on if the backup is consistent or not. The only point is > that one must not forget to enable WAL archiving at all and keep > those segments created during backup "pg_stop_backup" tells about. The WAL files required to make a backup consistent are *all* of those generated between the pg_start_backup and the pg_stop_backup, just to be clear. Thanks, Stephen
Attachment:
signature.asc
Description: PGP signature