Re: What's the dfifference between pg_start_backup+copy+pg_stop_backup+WAL vs. pg_start_backup+pg_stop_backup+copy+WAL?

Stephen Frost <sfrost@xxxxxxxxxxx> · Fri, 23 Jul 2021 14:10:16 -0400

Greetings,

* Thorsten Schöning (tschoening@xxxxxxxxxx) wrote:
> Guten Tag Stephen Frost,
> am Freitag, 23. Juli 2021 um 17:24 schrieben Sie:
> 
> > [...]Similar, once you reach the BACKUP_END record of
> > a backup you know the database is consistent- but only if all of the
> > files that PG was writing to were copied between the start checkpoint
> > of the backup and the BACKUP_END record in the WAL.
> 
> So I am correct that the recovery process starts at what is documented
> as CHECKPOINT LOCATION in the file backup_label? If the data directory
> looks like a more current checkpoint might be available, that is NOT
> used? Instead, all available WALs are replayed?

Yes, the recovery has to start from the checkpoint in the backup label.
The fact that pg_control might have a more recent checkpoint is exactly
the problem- any files which were copied before the checkpoint in
pg_control wouldn't have the correct contents and wouldn't be fixed by
the recovery process.  Furthermore, if PG were to stop replaying before
getting to the correct backup-end location then the database might not
be consistent at that point because some records had been written into
the WAL as committed but hadn't made it to the heap yet.  There's risk
on both sides.

> But how come the state of other data files into the recovery process?
> that's what Laurenz Albe's example is about: A data file which is more
> current than expected because that file is only copied, but the
> associated changes are stored in WALs after pg_stop_backup and
> therefore need to be considered lost.

I'm not sure that you're quite understanding the example because we
always write to WAL first and sync that before writing to the heap, so
there isn't a case of a "more recent" file which is somehow ahead of
what's in the WAL.

> Why is the data file with that additional row in B used at all? The
> additional row is associated with a transaction number not available
> when restoring based on CHECKPOINT LOCATION + WALs of pg_stop_backup.

The data file is copied because it's part of the database..?  If you
didn't copy it, then you'd be missing a lot of data.  As for the
question about if that row would be visible, it certainly might be- but
it depends on when the transaction log files were copied, or if perhaps
the tuple ended up being frozen somehow.

> So why is that row used at all instead of simply ignored? That would
> actually revert the cluster to the state of pg_stop_backup, but as
> said, backup outdate anyway. I still don't get why the result is not
> as consistent as e.g. a crash at the same point in time instead of
> doing a backup.

How would you know to ignore it if that tuple was frozen?

A crashed PG cluster isn't consistent until WAL replay from the last
checkpoint to the ending point of WAL has completed.  That's exactly the
same with a backup, but the checkpoint you have to start from is the one
stored in the backup label and the point in the WAL that has to be
reached for consistency is the one that's at the backup-end location in
the WAL.  Otherwise they're both the same.  The reason you have to start
from the start-backup checkpoint is exactly because the data files are
being copied over a period of time and not all at once- unlike with a
crash where everything stops at the same time (and if that doesn't
happen, you definitely do have the risk of having corruption from such a
partial-crash, though we've tried our best to detect such cases and deal
with them, but we need the kernel to actually tell us when there's a
problem and that's been something of an interesting challenge..).

> Mit freundlichen Grüßen
> 
> Thorsten Schöning

Seriously, when your signature is twice the size of your email, it's too
much.  Please cut it down when posting to these lists, we certainly
don't need hundreds of copied of it in our archives or everyone's mail
boxes.

Thanks,

Stephen
Attachment:
signature.asc

Description: PGP signature