library that i am using https://github.com/dshulyak/uring It requires golang 1.14, if installed, benchmark can be run with: go test ./fs -run=xx -bench=BenchmarkReadAt/uring_8 -benchtime=1000000x go test ./fs -run=xx -bench=BenchmarkReadAt/uring_5 -benchtime=8000000x note that it will setup uring instance per cpu, with shared worker pool. it will take me too much time to implement repro in c, but in general i am simply submitting multiple concurrent read requests and watching read rate. On Mon, 24 Aug 2020 at 13:46, Jens Axboe <axboe@xxxxxxxxx> wrote: > > On 8/24/20 4:40 AM, Dmitry Shulyak wrote: > > In the program, I am submitting a large number of concurrent read > > requests with o_direct. In both scenarios the number of concurrent > > read requests is limited to 20 000, with only difference being that > > for 512b total number of reads is 8millions and for 8kb - 1million. On > > 5.8.3 I didn't see any empty reads at all. > > > > BenchmarkReadAt/uring_512-8 8000000 1879 > > ns/op 272.55 MB/s > > BenchmarkReadAt/uring_8192-8 1000000 18178 > > ns/op 450.65 MB/s > > > > I am seeing the same numbers in iotop, so pretty confident that the > > benchmark is fine. Below is a version with regular syscalls and > > threads (note that this is with golang): > > > > BenchmarkReadAt/os_512-256 8000000 4393 > > ns/op 116.55 MB/s > > BenchmarkReadAt/os_8192-256 1000000 18811 > > ns/op 435.48 MB/s > > > > I run the same program on 5.9-rc.2 and noticed that for workload with > > 8kb buffer and 1mill reads I had to make more than 7 millions retries, > > which obviously makes the program very slow. For 512b and 8million > > reads there were only 22 000 retries, but it is still very slow for > > some other reason. > > > > BenchmarkReadAt/uring_512-8 8000000 8432 ns/op 60.72 MB/s > > BenchmarkReadAt/uring_8192-8 1000000 42603 ns/op 192.29 MB/s > > > > In iotop i am seeing a huge increase for 8kb, actual disk read goes up > > to 2gb/s, which looks somewhat suspicious given that my ssd should > > support only 450mb/s. If I will lower the number of concurrent > > requests to 1000, then there are almost no empty reads and numbers for > > 8kb go back to the same level I saw with 5.8.3. > > > > Is it a regression or should I throttle submissions? > > Since it's performing worse than 5.8, sounds like there is. How can we > reproduce this? > > -- > Jens Axboe >