On Wed, 18 Jun 2008, Sam Mason wrote:
Isn't fsync only a side-effect of having a write-back cache between programs and the disk? This means it's only purpose is to ensure that the cache is consistent with what's on disk. Because all programs running within a system are running on top of the cache they don't know or care whether the cache actually matches up to the disk.
Most programs don't. PostgreSQL writes to the database in two stages: the WAL, followed by an fsync, then later to the main database files. You can't trust the WAL will be around for recovery until the first fsync returns. The checkpoint process makes sure everything that went into the WAL then made it to the main database files, and again it doesn't trust that it's really on disk until the fsync returns.
Therefore, if I understand things correctly, the state of fsync shouldn't matter in this use case. It's equally borken independent to the state of fsync.
Quote borken indeed, and fsync has nothing to do with it. The theory proposed is that since no writes were done, the backup should be consistant. This is quite wrong. The most obvious case showing that is one where a time-driven checkpoint occured (as happens every 5 minutes by default) while you were in the middle of backing up. Let's say the main database files are backed up before the checkpoint, but the backup is still going on some giant archival table. The checkpoint happens; it updates the earlier files already in the backup. The checkpoint finishes, and erases the WAL logs. Now the backup makes it way to the WAL files. You're screwed when you try and recover this database from the backup. The database doesn't have the latest updates, and the WAL can't recover them because it already cleared its copy of them out thinking they weren't needed anymore. You'll be lucky to get the database to start at all, it's missing data you thought was commited before the backup started, and who knows what subtle corruption you'll find.
Now, in reality, even time-driven checkpoints don't do anything if there hasn't been activity, so it may very well be the case that any one database backup is fine. But you can't ignore the requirement to do a pg_start_backup before making a filesystem level backup and expect you'll get that lucky--sooner or later you will get a backup that won't restore if you keep that up.
-- * Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD