Re: Do results of pg_start_backup work without WAL segments created during backup?

Achilleas Mantzios <achill@xxxxxxxxxxxxxxxxxxxxx> · Mon, 8 Jul 2019 16:58:48 +0300

On 8/7/19 4:10 μ.μ., Thorsten Schöning wrote:
Guten Tag David Steele,
am Montag, 8. Juli 2019 um 14:12 schrieben Sie:

pg_start_backup() does a checkpoint, but then the database continues
writing as you copy the files in whatever order you choose.  You may
copy a file that has a partial write or copy some files involved in a
transaction before it happens and others afterwards -- in fact this is
normal and expected.
And because that's expected, Postgres can successfully restore from
that, e.g. having used checkpoints before:

[...]This log exists primarily for crash-safety purposes: if the
system  crashes, the database can be restored to consistency by
“replaying” the log entries made since the last checkpoint.
https://www.postgresql.org/docs/current/continuous-archiving.html

"since the last checkpoint": Missing WAL-segments mean a loss of data
only. It doesn't mean that formerly "checkpointed" data gets magically
broken, else crash recovery wouldn't work like described in the docs.
The checkpoint is what brings WALs and data files in sync. If checkpoints are far between then crash recovery is slower, if checkpoints are too frequent then your system gets slower. You gotta 
understand the checkpoint concepts before you even touch backups and PITR.  So yes, after the event of crash, missing or corrupted WAL files will get your system unusable. (read about 
pg_reset_xlog/wal ).
The checkpoint constrains the range of WAL that you need, but that WAL
is absolutely needed to reconstruct the changes that happened during the
backup.
Which makes sense if all WAL-archives are simply considered to be
incremental changes based on some former full backup. But that's the
point: I don't see how WAL-archives created between pg_start- and
pg_stop_backup are any different to later ones. Of course one needs
those to not loose data at all, but that doesn't tell anything about
how usable the data directory in itself is already without those.
You gotta understand the distinction :
- indefinite uninterrupted sequence of WAL files : an nice "luxury" enabling PITR at any point in time
(well of course this is not luxury for average serious environments but any way, pls read on)
- uninterrupted sequence of WAL files defined by pg_start/stop_backup : A necessity for backup recovery! A restore for this backup will need AT LEAST *ALL* those files!

Postgres seems to have simply defined that they additionally care
about the time when a backup is running. Which is fine of course, but
I still don't see any technical or conceptual limitation of not
following that decision. If I backup some VM using snapshots, I don't
necessarily care about the changes made within the VM during the
backup as well. Those are simply handled by the next backup. But there
are additional products streaming all changes to the VM somewhere, if
one needs that.
If your VM snapshot guarantees an atomic snapshot of all file systems then it is fine as a full file-level backup solution.
However if you need PITR in between full backups then it is fine to combine your VM snapshots with pg_start/stop_backup and have both full backups and PITR. We employ this solution as well as 
pgbackrest for various servers.

OTOH, it's of course good to have two other opinions to mine when my
boss asks if things are OK the way they are. :-)

Mit freundlichen Grüßen,

Thorsten Schöning

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt