Jeff Janes <jeff.janes@xxxxxxxxx> writes: > On Tuesday, July 30, 2013, James Sewell wrote: >> I understand what you are saying, and I understand how the backup_label >> works - but I still don't understand why the pg_start and pg_stop commands >> are REQUIRED when doing a snapshot backup to ensure data integrity. >> >> Surely not using them and restoring a snapshot is the same as starting >> after a crash, and will result in log replay to get to the latest possible >> consistent state? > That's true provided that all of your data is on a single volume, and you > trust your "snapshot" to be free of bugs. James stated to begin with that his data was spread across multiple volumes, so it's definitely not safe for him to omit pg_start_backup/pg_stop_backup. Perhaps it would help to consider what can happen when you're not using filesystem snapshots at all, but just an ordinary userspace backup program such as "tar". The difference between recovering from a tar backup and crash recovery is that, because the backup is taken over an extended period of time, it may contain a set of data that does not match any possible instantaneous state of the on-disk data --- and crash recovery only promises to deal with the latter. Here is a concrete example of what can happen: 1. The tar process copies the file for table foo. There are changes to foo in Postgres' shared buffers that haven't made it to disk yet (although those changes are committed and recorded in on-disk WAL), so the copy made by tar isn't entirely up to date. 2. PG's checkpoint process starts a checkpoint run. Along the way, it flushes out the changes to table foo. When done, it updates the last-checkpoint pointer in pg_control, which tells where crash recovery would need to start replaying WAL. 3. The tar process archives pg_control. Now, if you restore the tar backup onto a new system and start up Postgres, you will have an obsolete copy of table foo --- and WAL replay will not apply the needed updates to foo, because it will start from the point in WAL that pg_control says it should start from, and that's after the WAL records that describe the missing changes. Note that this will fail even if you assume you've got perfectly good and complete copies of the WAL files; there's a whole 'nother set of hazards if you don't. For recovery from a tar backup to work, the archived copy of pg_control must point to a spot in the WAL sequence that is before any changes that could possibly not yet appear in any archived data files. The purpose of pg_start_backup/pg_stop_backup is to provide the synchronization needed to meet this requirement. The comparable case isn't possible for crash recovery, assuming that the OS and storage hardware implement fsync() correctly, because we'll have fsync'd the changes to foo down to disk before updating pg_control. Now, if you instead take a filesystem snapshot (representing some instantaneous state of the disk contents) and run "tar" to copy that, you have a good backup, because you must have a copy of pg_control that will tell you to re-apply any changes that are missing from the data files, as well as WAL files that contain the needed records. However, this is only certain if all that data is on *one* filesystem, because otherwise you can't be sure you have mutually consistent snapshots. And you're vulnerable to any bugs in the filesystem's snapshot implementation that might give you inconsistent copies of different files. (Such bugs would probably be closely related to bugs in fsync ... but that doesn't mean they're necessarily exactly the same.) So that's the long form of Jeff's comment above. Any clearer now? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general