Christoph Hellwig wrote: > In the case of direct I/O falling back to buffered I/O we sync data > twice currently: once at the end of generic_file_buffered_write using > filemap_write_and_wait_range and once a little later in > __generic_file_aio_write using do_sync_mapping_range with all flags set. > > The wait before write of the do_sync_mapping_range call does not make > any sense, so just keep the filemap_write_and_wait_range call and move > it to the right spot. Are you sure this is an expectation of O_DIRECT? A few notes from the net, including some documentation from IBM, advise using O_DIRECT|O_DSYNC if you need sync when direct I/O falls back to buffered on some other OSes. IBM (about AIX I believe): http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.genprogc/doc/genprogc/fileio.htm Direct I/O and Data I/O Integrity Completion Although direct I/O writes are done synchronously, they do not provide synchronized I/O data integrity completion, as defined by POSIX. Applications that need this feature should use O_DSYNC in addition to O_DIRECT. O_DSYNC guarantees that all of the data and enough of the metadata (for example, indirect blocks) have written to the stable store to be able to retrieve the data after a system crash. O_DIRECT only writes the data; it does not write the metadata. >From an earlier thread, "O_DIRECT and barriers": Theodore Tso wrote: > On Fri, Aug 21, 2009 at 10:26:35AM -0400, Christoph Hellwig wrote: > > > It turns out that applications needing integrity must use fdatasync or > > > O_DSYNC (or O_SYNC) *already* with O_DIRECT, because the kernel may > > > choose to use buffered writes at any time, with no signal to the > > > application. > > > > The fallback was a relatively recent addition to the O_DIRECT semantics > > for broken filesystems that can't handle holes very well. Fortunately > > enough we do force O_SYNC (that is Linux O_SYNC aka Posix O_DSYNC) > > semantics for that already. > > Um, actually, we don't. If we did that, we would have to wait for a > journal commit to complete before allowing the write(2) to complete, > which would be especially painfully slow for ext3. There's no point in a "half-sync". Nobody expects or can usefully depend on it. So imho we should drop the filemap_write_and_wait_range entirely when O_DSYNC is not set. O_DIRECT without syncing in the buffered fallback will be a useful performance optimisation for applications (including virtual machines) which do sequences of writes interspersed with fdatasync calls on sparse files, or when extending files, or to filesystems which don't implement O_DIRECT. Since they need fdatasync anyway, even with direct I/O to get integrity on some hardware, that's a sensible coding pattern. -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html