Dear ext4 developers, This is my test - preallocate a large file (2G) and then do sequential 4K direct-io writes to that file, with fdatasync after every write. I am preallocating using fallocate mode 0. I noticed that if the 2G file is pre-written rather than fallocate'd I get more than twice the throughput. I could reproduce this with fio. The storage is nvme. Kernel version is 5.3.18 on Suse. 1. Clear the location # rm -rf /mnt/nvme1n1/* 2. Run fio using fallocate # taskset -c 0 ./fio -directory=/mnt/nvme1n1 -ioengine=io_uring -fdatasync=1 -direct=1 -rw=write -iodepth=128 -iodepth_batch=64 -iodepth_batch_complete=64 -fallocate=native -bs=4k -size=2G -thread=1 -time_based=0 -numjobs=1 -group_reporting -output=fio.out -name=fiotest 3. Results write: IOPS=188k, BW=732MiB/s (768MB/s)(2048MiB/2796msec) 4. Run the same test again, this time the file already exists from the previous run. write: IOPS=420k, BW=1640MiB/s (1719MB/s)(2048MiB/1249msec) It doesn't matter if I pass -fallocate to fio or not in step 4. When I run ftrace (and if I am understanding the o/p correctly) I see that in the first run ext4_convert_unwritten_extents() seems to be taking a lot of time. This call is not present in the second run. 110) <...>-11449 | # 1102.026 us | } /* ext4_convert_unwritten_extents [ext4] */ 110) <...>-11449 | 0.117 us | ext4_release_io_end [ext4](); 110) <...>-11449 | # 1102.421 us | } /* ext4_put_io_end [ext4] */ 110) <...>-11449 | # 1102.599 us | } /* ext4_end_io_dio [ext4] */ Am I doing something wrong or is this difference expected? Any suggestion to get a better throughput without actually pre-writing the file. Thank you for your time, Santosh