Ron Mayer wrote: > Marco Colombo wrote: >> Ron Mayer wrote: >>> Greg Smith wrote: >>>> There are some known limitations to Linux fsync that I remain somewhat >>>> concerned about, independantly of LVM, like "ext3 fsync() only does a >>>> journal commit when the inode has changed" (see >>>> http://kerneltrap.org/mailarchive/linux-kernel/2008/2/26/990504 ).... >>> I wonder if there should be an optional fsync mode >>> in postgres should turn fsync() into >>> fchmod (fd, 0644); fchmod (fd, 0664); > 'course I meant: "fchmod (fd, 0644); fchmod (fd, 0664); fsync(fd);" >>> to work around this issue. >> Question is... why do you care if the journal is not flushed on fsync? >> Only the file data blocks need to be, if the inode is unchanged. > > You don't - but ext3 fsync won't even push the file data blocks > through a disk cache unless the inode was changed. > > The point is that ext3 only does the "write barrier" processing > that issues the FLUSH CACHE (IDE) or SYNCHRONIZE CACHE (SCSI) > commands on inode changes, not data changes. And with no FLUSH > CACHE or SYNCHRONINZE IDE the data blocks may sit in the disks > cache after the fsync() as well. Yes, but we knew it already, didn't we? It's always been like that, with IDE disks and write-back cache enabled, fsync just waits for the disk reporting completion and disks lie about that. Write barriers enforce ordering, WHEN writes are committed to disk, they will be in order, but that doesn't mean NOW. Ordering is enough for FS a journal, the only requirement is consistency. Anyway, it's the block device job to control disk caches. A filesystem is just a client to the block device, it posts a flush request, what happens depends on the block device code. The FS doesn't talk to disks directly. And a write barrier is not a flush request, is a "please do not reorder" request. On fsync(), ext3 issues a flush request to the block device, that's all it's expected to do. Now, some block devices may implement write barriers issuing FLUSH commands to the disk, but that's another matter. A FS shouldn't rely on that. You can replace a barrier with a flush (not as efficently), but not the other way around. If a block device driver issues FLUSH for a barrier, and doesn't issue a FLUSH for a flush, well, it's a buggy driver, IMHO. .TM. - Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general