On Tue, Aug 15, 2006 at 03:02:56PM -0400, Michael Stone wrote: > On Tue, Aug 15, 2006 at 02:33:27PM -0400, mark@xxxxxxxxxxxxxx wrote: > >On Tue, Aug 15, 2006 at 01:26:46PM -0400, Michael Stone wrote: > >>On Tue, Aug 15, 2006 at 11:29:26AM -0500, Jim C. Nasby wrote: > >>>Are 'we' sure that such a setup can't lose any data? > >>Yes. If you check the archives, you can even find the last time this was > >>discussed... > > > >I looked last night (coincidence actually) and didn't find proof that > >you cannot lose data. > > You aren't going to find proof, any more than you'll find proof that you > won't lose data if you do lose a journalling fs. (Because there isn't > any.) Unfortunately, many people misunderstand the what a metadata > journal does for you, and overstate its importance in this type of > application. > > >How do you deal with the file system structure being updated before the > >data blocks are (re-)written? > > *That's what the postgres log is for.* If the latest xlog entries don't > make it to disk, they won't be replayed; if they didn't make it to > disk, the transaction would not have been reported as commited. An > application that understands filesystem semantics can guarantee data > integrity without metadata journaling. So what causes files to get 'lost' and get stuck in lost+found? AFAIK that's because the file was written before the metadata. Now, if fsync'ing a file also ensures that all the metadata is written, then we're probably fine... if not, then we're at risk every time we create a new file (every WAL segment if archiving is on, and every time a relation passes a 1GB boundary). FWIW, the way that FreeBSD gets around the need to fsck a dirty filesystem before use without using a journal is to ensure that metadate operations are always on the drive before the actual data is written. There's still a need to fsck a dirty filesystem, but it can now be done in the background, with the filesystem mounted and in use. > >>The bottom line is that the only reason you need a metadata journalling > >>filesystem is to save the fsck time when you come up. On a little > >>partition like xlog, that's not an issue. > > > >fsck isn't only about time to fix. fsck is needed, because the file system > >is broken. > > fsck is needed to reconcile the metadata with the on-disk allocations. > To do that, it reads all the inodes and their corresponding directory > entries. The time to do that is proportional to the size of the > filesystem, hence the comment about time. fsck is not needed "because > the filesystem is broken", it's needed because the filesystem is marked > dirty. > > Mike Stone > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@xxxxxxxxxxxxx Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461