Re: Snapshot backups

Tom Lane <tgl@xxxxxxxxxxxxx> · Wed, 31 Jul 2013 01:13:50 -0400

Jeff Janes <jeff.janes@xxxxxxxxx> writes:
> On Tuesday, July 30, 2013, James Sewell wrote:
>> I understand what you are saying, and I understand how the backup_label
>> works - but I still don't understand why the pg_start and pg_stop commands
>> are REQUIRED when doing a snapshot backup to ensure data integrity.
>> 
>> Surely not using them and restoring a snapshot is the same as starting
>> after a crash, and will result in log replay to get to the latest possible
>> consistent state?

> That's true provided that all of your data is on a single volume, and you
> trust your "snapshot" to be free of bugs.

James stated to begin with that his data was spread across multiple
volumes, so it's definitely not safe for him to omit
pg_start_backup/pg_stop_backup.

Perhaps it would help to consider what can happen when you're not using
filesystem snapshots at all, but just an ordinary userspace backup program
such as "tar".  The difference between recovering from a tar backup and
crash recovery is that, because the backup is taken over an extended
period of time, it may contain a set of data that does not match any
possible instantaneous state of the on-disk data --- and crash recovery
only promises to deal with the latter.  Here is a concrete example of what
can happen:

1. The tar process copies the file for table foo.  There are changes to
foo in Postgres' shared buffers that haven't made it to disk yet (although
those changes are committed and recorded in on-disk WAL), so the copy made
by tar isn't entirely up to date.

2. PG's checkpoint process starts a checkpoint run.  Along the way,
it flushes out the changes to table foo.  When done, it updates the
last-checkpoint pointer in pg_control, which tells where crash recovery
would need to start replaying WAL.

3. The tar process archives pg_control.

Now, if you restore the tar backup onto a new system and start up
Postgres, you will have an obsolete copy of table foo --- and WAL replay
will not apply the needed updates to foo, because it will start from the
point in WAL that pg_control says it should start from, and that's after
the WAL records that describe the missing changes.  Note that this will
fail even if you assume you've got perfectly good and complete copies
of the WAL files; there's a whole 'nother set of hazards if you don't.

For recovery from a tar backup to work, the archived copy of pg_control
must point to a spot in the WAL sequence that is before any changes that
could possibly not yet appear in any archived data files.  The purpose of
pg_start_backup/pg_stop_backup is to provide the synchronization needed to
meet this requirement.

The comparable case isn't possible for crash recovery, assuming that the
OS and storage hardware implement fsync() correctly, because we'll have
fsync'd the changes to foo down to disk before updating pg_control.

Now, if you instead take a filesystem snapshot (representing some
instantaneous state of the disk contents) and run "tar" to copy that,
you have a good backup, because you must have a copy of pg_control that
will tell you to re-apply any changes that are missing from the data
files, as well as WAL files that contain the needed records.  However,
this is only certain if all that data is on *one* filesystem, because
otherwise you can't be sure you have mutually consistent snapshots.
And you're vulnerable to any bugs in the filesystem's snapshot
implementation that might give you inconsistent copies of different
files.  (Such bugs would probably be closely related to bugs in fsync
... but that doesn't mean they're necessarily exactly the same.)

So that's the long form of Jeff's comment above.  Any clearer now?

			regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general