Hi, We have been running some performance experiments on ext4 DAX and noticed some behavior that is puzzling us. We'd be grateful if someone could explain what is going on. Workload 1: create a file while file size <= 5GB: append 4K fsync close file Running time: 20 s Workload 2: open 5GB file for each block in file: overwrite 4K fsync close file Running time: 1.8 s While we expect workload 1 to take more time than workload 2 since it is extending the file, 10x higher time seems suspicious. If we remove the fsync in workload 1, the running time drops to 3s. If we remove the fsync in workload 2, the running time is around the same (1.5s). We initially suspected this was because of delayed allocation. But when we mounted ext4 without delayed allocation, we got similar results. We ran this on ext4 with -o dax option on /dev/pmem0, where the data on a write() system call is written using non-temporal writes. However, we also ran this on ext4 without dax on top of ramdisk (/dev/ram0), and found no difference in the numbers. So we believe the root cause to be in ext4 and not in /dev/pmem or /dev/ram device. Could someone help figure this out? Thanks, Vijay