On Fri, Aug 21, 2009 at 10:26:35AM -0400, Christoph Hellwig wrote: > > It turns out that applications needing integrity must use fdatasync or > > O_DSYNC (or O_SYNC) *already* with O_DIRECT, because the kernel may > > choose to use buffered writes at any time, with no signal to the > > application. > > The fallback was a relatively recent addition to the O_DIRECT semantics > for broken filesystems that can't handle holes very well. Fortunately > enough we do force O_SYNC (that is Linux O_SYNC aka Posix O_DSYNC) > semantics for that already. Um, actually, we don't. If we did that, we would have to wait for a journal commit to complete before allowing the write(2) to complete, which would be especially painfully slow for ext3. This question recently came up on the ext4 developer's list, because of a question of how direct I/O to an preallocated (uninitialized) extent should be handled. Are we supposed to guarantee synchronous updates of the metadata by the time write(2) returns, or not? One of the ext4 developers (I can't remember if it was Mingming or Eric) asked an XFS developer what they did in that case, and I believe the answer they were given was that XFS started a commit, but did *not* wait for the commit to complete before returning from the Direct I/O write. In fact, they were told (I believe this was from an SGI engineer, but I don't remember the name; we can track that down if it's important) that if an application wanted to guarantee metadata would be updated for an extending write, they had to use fsync() or O_SYNC/O_DSYNC. Perhaps they were given an incorrect answer, but it's clear the semantics of exactly how Direct I/O works in edge cases isn't well defined, or at least clearly and widely understood. I have an early draft (for discussion only) what we think it means and what is currently implemented in Linux, which I've put up, (again, let me emphasisize) for *discussion* here: http://ext4.wiki.kernel.org/index.php/Clarifying_Direct_IO's_Semantics Comments are welcome, either on the wiki's talk page, or directly to me, or to the linux-fsdevel or linux-ext4. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html