On Tue, Nov 20, 2018 at 08:17:38AM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > When we write into an unwritten extent via direct IO, we dirty > metadata on IO completion to convert the unwritten extent to > written. However, when we do the FUA optimisation checks, the inode > may be clean and so we issue a FUA write into the unwritten extent. > This means we then bypass the generic_write_sync() call after > unwritten extent conversion has ben done and we don't force the > modified metadata to stable storage. > > This violates O_DSYNC semantics. The window of exposure is a single > IO, as the next DIO write will see the inode has dirty metadata and > hence will not use the FUA optimisation. Calling > generic_write_sync() after completion of the second IO will also > sync the first write and it's metadata. > > Fix this by avoiding the FUA optimisation when writing to unwritten > extents. Ouch, yes. We can't skip the log force when converting unwritten extent. If we really cared we could try to use FUA and only do the log force vs needing a full device flush, but that would require a fair amount of of work. So this looks good: Reviewed-by: Christoph Hellwig <hch@xxxxxx>