Hi all, Eduardo raised a puzzling question about why dax yields lower iops than direct-i/o. The expectation is the reverse, i.e. that direct-i/o should be slightly slower than dax due to block layer overhead. This holds true for xfs, but on ext4 dax yields half the iops of direct-i/o for an fio 4K random write workload. Here is a relative graph of ext4: dax + direct-i/o vs xfs: dax + direct-i/o https://user-images.githubusercontent.com/56363/62172754-40c01e00-b2e8-11e9-8e4e-29e09940a171.jpg A relative perf profile seems to show more time in ext4_journal_start() which I thought may be due to atime or mtime updates, but those do not seem to be the source of the extra journal I/O. The urgency is a curiosity at this point, but I expect an end user might soon ask whether this is an expected implementation side-effect of dax. Thanks in advance for any insight, and/or experiment ideas for us to go try. Eduardo collected perf reports of these runs here: https://github.com/pmem/ndctl/files/3449231/linux_5.3.2_perf.zip ...and the fio configuration is here: https://gist.github.com/djbw/e5e69cbccbaaf0f43ecde127393c305c