On Mon, 2007-06-25 at 19:06 +0900, Koichi Suzuki wrote: > Year, I agree we should carefully follow how Done really did a backup. > My point is PostgreSQL may have to extend the file during the hot backup > to write to the new block. If the snapshot is a consistent, point-in-time copy then I don't see how any I/O at all makes a difference. To my knowledge, both EMC and NetApp produce snapshots like this. IIRC, EMC calls these instant snapshots, NetApp calls them frozen snapshots. > It is slightly different from Oracle's case. > Oracle allocates all the database space in advance so that there could > be no risk to modify the metadata on the fly. Not really sure its different. Oracle allows dynamic file extensions and I've got no evidence that file extension is prevented from occurring during backup simply as a result of issuing the start hot backup command. Oracle and DB2 both support a stop-I/O-to-the-database mode. My understanding is that isn't required any more if you do an instant snapshot, so if people are using instant snapshots it should certainly be the case that they are safe to do this with PostgreSQL also. Oracle is certainly more picky about snapshotted files than PostgreSQL is. In Oracle, each file has a header with the LSN of the last checkpoint in it. This is used at recovery time to ensure the backup is consistent by having exactly equal LSNs across all files. PostgreSQL doesn't use file headers and we don't store the LSN on a per-file basis, though we do store the LSN in the control file for the whole server. > In our case, because SAN > based storage snapshot is device level, not file system level, even a > file system does not know that the snapshot is being taken and we might > encounter the case where metadata and/or user data are not consistent. > Such snapshot (whole filesystem) might be corrupted and cause file > system level error. > > I'm interested in this. Any further comment/openion is welcome. If you can show me either i) an error that occurs after the full and correct PostgreSQL hot backup procedures have been executed, or ii) present a conjecture that explains in detail how a device level error might occur then I will look into this further. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com