On Fri, Aug 2, 2019 at 7:43 AM Jan Kara <jack@xxxxxxx> wrote: > > Hi Dan! > > On Tue 30-07-19 16:49:41, Dan Williams wrote: > > Eduardo raised a puzzling question about why dax yields lower iops > > than direct-i/o. The expectation is the reverse, i.e. that direct-i/o > > should be slightly slower than dax due to block layer overhead. This > > holds true for xfs, but on ext4 dax yields half the iops of direct-i/o > > for an fio 4K random write workload. > > > > Here is a relative graph of ext4: dax + direct-i/o vs xfs: dax + direct-i/o > > > > https://user-images.githubusercontent.com/56363/62172754-40c01e00-b2e8-11e9-8e4e-29e09940a171.jpg > > > > A relative perf profile seems to show more time in > > ext4_journal_start() which I thought may be due to atime or mtime > > updates, but those do not seem to be the source of the extra journal > > I/O. > > > > The urgency is a curiosity at this point, but I expect an end user > > might soon ask whether this is an expected implementation side-effect > > of dax. > > > > Thanks in advance for any insight, and/or experiment ideas for us to go try. > > Yeah, I think the reason is that ext4_iomap_begin() currently starts a > transaction unconditionally for each write whereas ext4_direct_IO_write() > is more clever and starts a transaction only when needing to allocate any > blocks. We could put similar smarts into ext4_iomap_begin() and it's > probably a good idea, just at this moment I'm working with one guy on > moving ext4 direct IO code to iomap infrastructure which overhauls > ext4_iomap_begin() anyway, so let's do this after that work. Sounds good, thanks for the insight!