On Mon, Aug 31, 2015 at 05:40:04AM -0700, Sage Weil wrote: > On Mon, 31 Aug 2015, Dave Chinner wrote: > > After taking a tangent to find a tracepoint regression that was > > getting in my way, I found that there was a significant pause > > between the inode locking calls within xfs_file_fsync and the inode > > locking calls on the buffered write. Roughly 8ms, in fact, on almost > > every call. After adding a couple more test trace points into the > > XFS fsync code, it turns out that a hardware cache flush is causing > > the delay. That is, because we aren't doing log writes that trigger > > cache flushes and FUA writes, we have to issue a > > blkdev_issue_flush() call from xfs_file_fsync and that is taking 8ms > > to complete. > > This is where my understanding of block layer flushing really breaks down, > but in both cases we're issues flush requests to the hardware, right? Is > the difference that the log write is a FUA flush request with data, and > blkdev_issue_flush() issues a flush request without associated data? Pretty much, though th elog write also does a cache flush before the FUA write. i.e. The log writes consist of a bio with data issued via: submit_bio(REQ_FUA | REQ_FLUSH | WRITE_SYNC, bio); blkdev_issue_flush consists of an empty bio issued via: submit_bio(REQ_FLUSH | WRITE_SYNC, bio); So from a block layer and filesystem point of view there is little difference, and the only difference at the SCSI layer is the WRITE w/ FUA that is issued after the cache flush in the log write case (see https://lwn.net/Articles/400541/ fo a bit more background). I haven't looked any deeper than this so far - I don't have time right now to do so... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs