Question: t/io_uring performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

I am currently trying to run the t/io_uring benchmark but I am unable to achieve the IOPS that I would expect. In 2019, Axboe achieved 1.6M IOPS [3] or 1.7M IOPS [1] using a single CPU core (4k random reads). On my machine (AMD EPYC 7702P, 2x Intel P4510 NVMe SSD, separate 3rd SSD for the OS), I can't even get close to those numbers.

Each of my SSDs can handle about 560k IOPS when running t/io_uring. Now, when I launch the benchmark with both SSDs, I still only get about 580k IOPS, from which each SSD gets about 300k IOPS. When I launch two separate t/io_uring instances, I get the full 560k IOPS on each device. To me, this sounds like the benchmark is CPU bound. Given that the CPU is quite decent, I am surprised that I only get half of the single-threaded IOPS that my SSDs could handle (and 1/3 of what Axboe got).

I am limited to using Linux 5.4.0 (Ubuntu 20.04) currently but the numbers from Axboe above are from 2019, when 5.4 was released. So while I don't expect to achieve insane numbers like Axboe in a more recent measurement [4], 580k seems way less than it should be. Does someone have an idea what could cause this significant difference? You can find some more measurement outputs below, for reference.

Best regards
Hans-Peter Lehmann

= Measurements =

Performance:
# t/io_uring -b 4096 /dev/nvme0n1 /dev/nvme1n1
i 3, argc 5
Added file /dev/nvme0n1 (submitter 0)
Added file /dev/nvme1n1 (submitter 0)
sq_ring ptr = 0x0x7f9643d92000
sqes ptr    = 0x0x7f9643d90000
cq_ring ptr = 0x0x7f9643d8e000
polled=1, fixedbufs=1, register_files=1, buffered=0 QD=128, sq_ring=128, cq_ring=256
submitter=1207502
IOPS=578400, IOS/call=32/31, inflight=102 (64, 38)
IOPS=582784, IOS/call=32/32, inflight=95 (31, 64)
IOPS=583040, IOS/call=32/31, inflight=125 (61, 64)
IOPS=584665, IOS/call=31/32, inflight=114 (64, 50)

Scheduler for both SSDs disabled:
# cat /sys/block/nvme0n1/queue/scheduler
[none] mq-deadline

Most time is spent in the kernel:
# time t/io_uring -b 4096 /dev/nvme0n1 /dev/nvme1n1
[...]
real    0m8.770s
user    0m0.156s
sys     0m8.514s

Call graph:
# perf report
- 93.90% io_ring_submit
  - [...]
    - 75.32% io_read
        - 67.13% blkdev_read_iter
          - 65.65% generic_file_read_iter
              - 63.20% blkdev_direct_IO
                - 61.17% __blkdev_direct_IO
                    - 45.49% submit_bio
                      - 43.95% generic_make_request
                          - 33.30% blk_mq_make_request
                            + 8.52% blk_mq_get_request
                            + 8.02% blk_attempt_plug_merge
                            + 5.80% blk_flush_plug_list
                            + 1.48% __blk_queue_split
                            + 1.14% __blk_mq_sched_bio_merge
                            + [...]
                          + 7.90% generic_make_request_checks
                        0.62% blk_mq_make_request
                    + 8.50% bio_alloc_bioset

= References =

[1]: https://kernel.dk/io_uring.pdf
[2]: https://github.com/axboe/fio/issues/579#issuecomment-384345234
[3]: https://twitter.com/axboe/status/1174777844313911296
[4]: https://lore.kernel.org/io-uring/4af91b50-4a9c-8a16-9470-a51430bd7733@xxxxxxxxx/T/#u



[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux