In the program, I am submitting a large number of concurrent read requests with o_direct. In both scenarios the number of concurrent read requests is limited to 20 000, with only difference being that for 512b total number of reads is 8millions and for 8kb - 1million. On 5.8.3 I didn't see any empty reads at all. BenchmarkReadAt/uring_512-8 8000000 1879 ns/op 272.55 MB/s BenchmarkReadAt/uring_8192-8 1000000 18178 ns/op 450.65 MB/s I am seeing the same numbers in iotop, so pretty confident that the benchmark is fine. Below is a version with regular syscalls and threads (note that this is with golang): BenchmarkReadAt/os_512-256 8000000 4393 ns/op 116.55 MB/s BenchmarkReadAt/os_8192-256 1000000 18811 ns/op 435.48 MB/s I run the same program on 5.9-rc.2 and noticed that for workload with 8kb buffer and 1mill reads I had to make more than 7 millions retries, which obviously makes the program very slow. For 512b and 8million reads there were only 22 000 retries, but it is still very slow for some other reason. BenchmarkReadAt/uring_512-8 8000000 8432 ns/op 60.72 MB/s BenchmarkReadAt/uring_8192-8 1000000 42603 ns/op 192.29 MB/s In iotop i am seeing a huge increase for 8kb, actual disk read goes up to 2gb/s, which looks somewhat suspicious given that my ssd should support only 450mb/s. If I will lower the number of concurrent requests to 1000, then there are almost no empty reads and numbers for 8kb go back to the same level I saw with 5.8.3. Is it a regression or should I throttle submissions?