From: Stephen Frost <sfrost@xxxxxxxxxxx>
Sent: Thursday, March 30, 2023 4:59 AM
To: Laurenz Albe <laurenz.albe@xxxxxxxxxxx>
Cc: oernii+pg@xxxxxxxxxx <oernii+pg@xxxxxxxxxx>; pgsql-admin@xxxxxxxxxxxxxxxxxxxx <pgsql-admin@xxxxxxxxxxxxxxxxxxxx>
Subject: Re: pg15 and VM snapshots with pg_backup_start
Sent: Thursday, March 30, 2023 4:59 AM
To: Laurenz Albe <laurenz.albe@xxxxxxxxxxx>
Cc: oernii+pg@xxxxxxxxxx <oernii+pg@xxxxxxxxxx>; pgsql-admin@xxxxxxxxxxxxxxxxxxxx <pgsql-admin@xxxxxxxxxxxxxxxxxxxx>
Subject: Re: pg15 and VM snapshots with pg_backup_start
Greetings,
* Laurenz Albe (laurenz.albe@xxxxxxxxxxx) wrote:
> On Thu, 2023-03-30 at 11:50 +0200, oernii+pg@xxxxxxxxxx wrote:
> > Hi, we run a lot of PG14 and lower. We do VM snapshots, which we backup.
> > Before the snapshot we run a quiesce script which also runs pg_start_backup().
> > After a few seconds the snapshot if finished and we call pg_stop_backup.
> > This is working file for us, but pg15 removed this function and replaced it
> > with pg_backup_start(), which however requires the session to be open, otherwise we get:
> > WARNING: aborting backup due to backend exiting before pg_backup_stop was called
This was removed because between the start and stop backup calls if the
system crashed, you would be in an ugly state where a backup_label file
exists but it isn't actually a backup, and the database system wouldn't
start. While you may not have ever had this happen, the risk was
certainly there and you were just lucky to not have hit it yet.
> > For pg15 do I need to keep the session open, will the snapshot contain uncorruped data?
>
> If the snapshot is atomic, that is, represents a state that the file system had
> at sume point in time, PostgreSQL can always use it for recovery.
Right- but the entire set of database files must be included in the
single atomic snapshot and you really shouldn't try to use those for
PITR because you absolutely must replay all of the WAL from the snapshot
to the end to be sure you reach consistency.
> Otherwise, you cannot use this backup, because you only get "backup_label" from
> "pg_backup_stop()", and without "backup_label", you cannot recover the backup.
You must absolutely also be doing WAL archiving properly and verify that
all of the WAL generated between the backup start and backup stop were
stored to be sure that you have a restorable backup. Then you also need
to be sure to have the correct backup_label for the specific backup and
to restore that file when performing the restore.
All-in-all, this approach of using a snapshot and then copying it has
little advantage and many disadvantages over using a more traditional
approach to backing up the data, so I don't really recommend it. I'd
strongly recommend using one of the built backup solutions which exist
and provide things like verifyable backups that have checksums for the
data backed up, ensure that all necessary WAL was copied, etc.
Thanks,
Stephen
* Laurenz Albe (laurenz.albe@xxxxxxxxxxx) wrote:
> On Thu, 2023-03-30 at 11:50 +0200, oernii+pg@xxxxxxxxxx wrote:
> > Hi, we run a lot of PG14 and lower. We do VM snapshots, which we backup.
> > Before the snapshot we run a quiesce script which also runs pg_start_backup().
> > After a few seconds the snapshot if finished and we call pg_stop_backup.
> > This is working file for us, but pg15 removed this function and replaced it
> > with pg_backup_start(), which however requires the session to be open, otherwise we get:
> > WARNING: aborting backup due to backend exiting before pg_backup_stop was called
This was removed because between the start and stop backup calls if the
system crashed, you would be in an ugly state where a backup_label file
exists but it isn't actually a backup, and the database system wouldn't
start. While you may not have ever had this happen, the risk was
certainly there and you were just lucky to not have hit it yet.
> > For pg15 do I need to keep the session open, will the snapshot contain uncorruped data?
>
> If the snapshot is atomic, that is, represents a state that the file system had
> at sume point in time, PostgreSQL can always use it for recovery.
Right- but the entire set of database files must be included in the
single atomic snapshot and you really shouldn't try to use those for
PITR because you absolutely must replay all of the WAL from the snapshot
to the end to be sure you reach consistency.
> Otherwise, you cannot use this backup, because you only get "backup_label" from
> "pg_backup_stop()", and without "backup_label", you cannot recover the backup.
You must absolutely also be doing WAL archiving properly and verify that
all of the WAL generated between the backup start and backup stop were
stored to be sure that you have a restorable backup. Then you also need
to be sure to have the correct backup_label for the specific backup and
to restore that file when performing the restore.
All-in-all, this approach of using a snapshot and then copying it has
little advantage and many disadvantages over using a more traditional
approach to backing up the data, so I don't really recommend it. I'd
strongly recommend using one of the built backup solutions which exist
and provide things like verifyable backups that have checksums for the
data backed up, ensure that all necessary WAL was copied, etc.
Thanks,
Stephen