On Fri, Aug 21, 2009 at 06:08:52PM -0400, Theodore Tso wrote: > On Fri, Aug 21, 2009 at 10:26:35AM -0400, Christoph Hellwig wrote: > > > It turns out that applications needing integrity must use fdatasync or > > > O_DSYNC (or O_SYNC) *already* with O_DIRECT, because the kernel may > > > choose to use buffered writes at any time, with no signal to the > > > application. > > > > The fallback was a relatively recent addition to the O_DIRECT semantics > > for broken filesystems that can't handle holes very well. Fortunately > > enough we do force O_SYNC (that is Linux O_SYNC aka Posix O_DSYNC) > > semantics for that already. > > Um, actually, we don't. If we did that, we would have to wait for a > journal commit to complete before allowing the write(2) to complete, > which would be especially painfully slow for ext3. > > This question recently came up on the ext4 developer's list, because > of a question of how direct I/O to an preallocated (uninitialized) > extent should be handled. Are we supposed to guarantee synchronous > updates of the metadata by the time write(2) returns, or not? One of > the ext4 developers (I can't remember if it was Mingming or Eric) > asked an XFS developer what they did in that case, and I believe the > answer they were given was that XFS started a commit, but did *not* > wait for the commit to complete before returning from the Direct I/O > write. In fact, they were told (I believe this was from an SGI > engineer, but I don't remember the name; we can track that down if > it's important) that if an application wanted to guarantee metadata > would be updated for an extending write, they had to use fsync() or > O_SYNC/O_DSYNC. That would have been Eric asking me. My answer that O_DIRECT does not imply any new data integrity guarantees associated with a write(2) call - it just avoids system caches. You get the same guarantees of resiliency as a non-O_DIRECT write(2) call at completion - it may or may notbe there if you crash. If you want some guarantee of integrity, then you need to use O_DSYNC, O_SYNC or call f[data]sync(2) just like all other IO. Also, note that direct IO is not necessarily synchronous - you can do asynchronous direct IO..... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html