Re: [PATCH, RFC] ext4: Fix use of write barrier in commit logic

Jamie Lokier <jamie@xxxxxxxxxxxxx> · Tue, 20 May 2008 16:23:10 +0100

Theodore Tso wrote:
> As I mentioned earlier, adding blkdev_issue_flush() to ext[34]/fsync.c
> is I believe not necessary.  We should be doing the right thing in the
> commit.c file.  In the future, if we want some extra bonus points, in
> the case where writes have taken place to the inode, but no metadata
> operations have taken place (the common case when a database is
> writing to a pre-existing tablespace), it would be nice if fsync()
> could notice this case, force out the datablocks the old-fashioned way
> without forcing an journal commit, and then calling
> blkdev_issue_flush().  That should give us some nice performance wins
> for database workloads; unfortunately it probably won't do us any good
> on mailserver workloads.

Last time I looked, the database write + fsync case did not result in
a journal commit in some cases, and therefore no blkdev_issue_flush().
The problem was one of correctness.  Has this been fixed?

A database had no way to issue a hard disk barrier in these cases
without doing weird stuff like forcing metadata changes prior to fsync
(e.g. toggling permissions bits), which causes an intolerable two disk
seeks per fsync.  That is _way_ expensive.

There is also no way in ext3 to _just_ fdatasync() (no metadata even
if it has changed), with disk barrier/flush.

Imho a good place to call blkdev_issue_flush() is in the VFS, after
it's written all the data blocks, unless the filesystem has a better
override.  That would work with most filesystems automatically.
Request queue optimisations may trivially remove redundant flushes.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html