Hi, I'm working on io_uring integration for Java and using Fio to get a performance baseline. The problem is that Fio is slower than my benchmark and hence I do not have a proper baseline. Part of my problem is that I don't understand how sequential I/O generating works when the io_depth is larger than 1 (I'm testing with 64). In the first version, I was iterating over the blocks in the file and sequentially issuing the requests, and keeping the required number of requests in flight. The problem is that this will lead to many concurrent and consecutive blocks being scheduled and the Linux I/O scheduler will merge many. iostat Confirmed that a huge percentage of requests were merged. Using iostat I have verified that Fio is not suffering from merged read-request or write-requests, so I guess Fio is not using the above approach. Some other approaches I have tried. 1) Split the file into smaller sections and let each stride access a portion of the file. E.g. io_level = 64 and a 64 MB file, each stride gets a 1MB section of the file. 2) Each stride goes over the whole file but starts from a different position. With 64 io_level, there are 64 different start offsets. The performance I'm getting is still significantly higher than what Fio measures so I'm sure that I'm doing something wrong (no apples vs apples comparison). But the numbers my benchmark measures are consistent with what iostat is reporting: the bandwidth and IOPS do match. I have included the Fio configuration for completeness: fio --name=sometest --numjobs=1 --filesize=4M --time_based --runtime=60s --ramp_time=2s --ioengine=io_uring --direct=1 --verify=0 --bs=4k --iodepth=64 --rw=read --group_reporting=1 --cpus_allowed=1 I'm testing with Direct I/O and using a single thread that is pinned to core 1 which does sequential reads and I want to keep 64 requests in flight. If someone could shine a light on how Fio generates requests for sequential reads/writes. Regards, Peter.