On Mon, Jul 8, 2019 at 11:57 AM Thorsten Schöning <tschoening@xxxxxxxxxx> wrote:
Hi all,
we are reviewing our current backup process based on the low level
pg_start_backup and pg_stop_backup using the exclusive approach. I
wonder how important the WAL-archives created during backup really are
in terms of if they are necessary to get Postgres up and working at
all.
The docs mention that after pg_start_backup has been issued, files of
the data directory of Postgres can be copied however one likes. The
important point seems to be that pg_start_backup does checkpointing,
so that all data until the start of the backup gets written to disk.
Afterwards, additional writes can happen to any file at any given time
and changes are recorded using the WAL like normal.
What happens WITHOUT the WAL-archives created during the backup when a
cluster needs to be restored?
Then you have a corrupt and unusable backup.
The pg_start_backup/pg_stop_backup method for backups can *only* be used together with a working log archive.
>From my understanding, the cluster restores as normal, but only up
until the point when pg_start_backup executed. Without additionally
shipping the WAL-archives later, one would simply loose the data
created after pg_start_backup has been called. But once the data
Not just loose the data, your cluster will be corrupt.
OR are the copied files so fundamentally broken that Postgres is not
able to operate at all without the WAL-archives during backup?
This would be the case.
Wouldn't make much sense to me, because Postgres needs to operate
properly already to replay the WAL-archives. It needs to know from
which checkpoint to start, which is available after using
pg_start_backup. From my understanding, there's no info created by
pg_start_backup about additionally necessary WAL-archives blocking
bringing up the cluster successfully if not present. If none are
available, nothing gets replayed, but things still work.
No, without the WAL generated betweens tart and stop backup, the cluster will be incomplete and corrupt. In your description you are for example not counting for activity that happens *during* the checkpoint.
The information about which WAL blocks are necessary are generated during pg_stop_backup, not pg_start_backup.
There is a reason these functions are labeled "low level APIs". They are designed to work together with other parts, like the log archiving, not to be a complete solution on their own. There are other tools available that provide all the plumbing, such as pg_basebackup, pgbackrest and pgbarman. If there are any doubts whatsoever on how they interact in your environment, you should *really* be looking at one of those higher level tools.
/Magnus