Re: [PATCH] direct I/O fallback sync simplification

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Christoph Hellwig wrote:
> In the case of direct I/O falling back to buffered I/O we sync data
> twice currently: once at the end of generic_file_buffered_write using
> filemap_write_and_wait_range and once a little later in
> __generic_file_aio_write using do_sync_mapping_range with all flags set.
> 
> The wait before write of the do_sync_mapping_range call does not make
> any sense, so just keep the filemap_write_and_wait_range call and move
> it to the right spot.

Are you sure this is an expectation of O_DIRECT?

A few notes from the net, including some documentation from IBM,
advise using O_DIRECT|O_DSYNC if you need sync when direct I/O falls
back to buffered on some other OSes.

IBM (about AIX I believe):

    http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.genprogc/doc/genprogc/fileio.htm

    Direct I/O and Data I/O Integrity Completion

    Although direct I/O writes are done synchronously, they do not
    provide synchronized I/O data integrity completion, as defined by
    POSIX. Applications that need this feature should use O_DSYNC in
    addition to O_DIRECT. O_DSYNC guarantees that all of the data and
    enough of the metadata (for example, indirect blocks) have written
    to the stable store to be able to retrieve the data after a system
    crash. O_DIRECT only writes the data; it does not write the
    metadata.

>From an earlier thread, "O_DIRECT and barriers":

Theodore Tso wrote:
> On Fri, Aug 21, 2009 at 10:26:35AM -0400, Christoph Hellwig wrote:
> > > It turns out that applications needing integrity must use fdatasync or
> > > O_DSYNC (or O_SYNC) *already* with O_DIRECT, because the kernel may
> > > choose to use buffered writes at any time, with no signal to the
> > > application.
> >
> > The fallback was a relatively recent addition to the O_DIRECT semantics
> > for broken filesystems that can't handle holes very well.  Fortunately
> > enough we do force O_SYNC (that is Linux O_SYNC aka Posix O_DSYNC)
> > semantics for that already.
> 
> Um, actually, we don't.  If we did that, we would have to wait for a
> journal commit to complete before allowing the write(2) to complete,
> which would be especially painfully slow for ext3.

There's no point in a "half-sync".  Nobody expects or can usefully
depend on it.  So imho we should drop the filemap_write_and_wait_range
entirely when O_DSYNC is not set.

O_DIRECT without syncing in the buffered fallback will be a useful
performance optimisation for applications (including virtual machines)
which do sequences of writes interspersed with fdatasync calls on
sparse files, or when extending files, or to filesystems which don't
implement O_DIRECT.

Since they need fdatasync anyway, even with direct I/O to get
integrity on some hardware, that's a sensible coding pattern.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux