On Tue, Oct 22, 2013 at 1:10 PM, Shaun Thomas <sthomas@xxxxxxxxxxxxxxxx> wrote:
> So you can grab the extra files, but you can't make it apply them,Do I have to, though? Replaying transaction logs is baked into the crash recovery system. If I interrupt it in the middle of a checkpoint, it should be able to revert to the previous checkpoint that did succeed.
> as you are telling it that it doesn't need to.
True, but it needs to know that it needs to do that.
By including the extra WAL files, it would re-apply them, just like in a crash recovery.
Of course, that only works if I interrupt it by shutting the replica down. By backing up across a checkpoint, I run the risk of a race condition where some files were backed up before the checkpoint, and others afterwards. Which raises the question: isn't that risk the same with a regular backup? The database doesn't just stop checkpointing because a backup is in progress.
The backup_label file records the checkpoint that occurred inside the pg_start_backup() call and is not updated with subsequent checkpoints. It acts as an alternative control file, forcing recovery to start out at that checkpoint rather than some later one which was completed and recorded into the real control file while the backup was underway.
This is one of the advantages of pg_basebackup: since it injects backup_label directly into the backup (where it is needed) without creating it on the master (where it is not needed, other than as a way to make sure it ends up in the backup), it means that if the master crashes during a backup, with pg_basebackup it will start recovery from the last eligible checkpoint, rather than starting from the pg_start_backup() checkpoint. Not only does using the earlier checkpoint cause extra work, it also runs the risk that some of the WAL needed to start from the earlier checkpoint have already been recycled, so it refuses to start until someone manually intervenes by deleting the backup_label file.
There must be some internal detail I'm missing.
Either way, I'll add a routine to stall the standby backup until the restartpoint corresponding to the pg_start_backup has been replayed. I'll see if that helps.
A possible alternative would be to fake a backup_label file which contains the pointer to the restartpoint that was known-good at the time the master was put into backup mode. If you have full_page_writes off, that would be a problem. There may be other problems with it that I'm unaware of, and it seems like running with scissors.
Cheers,
Jeff