Hi Dan! On Tue 30-07-19 16:49:41, Dan Williams wrote: > Eduardo raised a puzzling question about why dax yields lower iops > than direct-i/o. The expectation is the reverse, i.e. that direct-i/o > should be slightly slower than dax due to block layer overhead. This > holds true for xfs, but on ext4 dax yields half the iops of direct-i/o > for an fio 4K random write workload. > > Here is a relative graph of ext4: dax + direct-i/o vs xfs: dax + direct-i/o > > https://user-images.githubusercontent.com/56363/62172754-40c01e00-b2e8-11e9-8e4e-29e09940a171.jpg > > A relative perf profile seems to show more time in > ext4_journal_start() which I thought may be due to atime or mtime > updates, but those do not seem to be the source of the extra journal > I/O. > > The urgency is a curiosity at this point, but I expect an end user > might soon ask whether this is an expected implementation side-effect > of dax. > > Thanks in advance for any insight, and/or experiment ideas for us to go try. Yeah, I think the reason is that ext4_iomap_begin() currently starts a transaction unconditionally for each write whereas ext4_direct_IO_write() is more clever and starts a transaction only when needing to allocate any blocks. We could put similar smarts into ext4_iomap_begin() and it's probably a good idea, just at this moment I'm working with one guy on moving ext4 direct IO code to iomap infrastructure which overhauls ext4_iomap_begin() anyway, so let's do this after that work. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR