On 11/14/21 20:23, Jens Axboe wrote:
On 11/14/21 10:07 AM, Avi Kivity wrote:
Running a trivial randread, direct=1 fio workload against a RAID-0
composed of some nvme devices, I see this pattern:
fio-7066 [009] 1800.209865: function: io_submit_sqes
fio-7066 [009] 1800.209866: function:
rcu_read_unlock_strict
fio-7066 [009] 1800.209866: function:
io_submit_sqe
fio-7066 [009] 1800.209866: function:
io_init_req
fio-7066 [009] 1800.209866:
function: io_file_get
fio-7066 [009] 1800.209866:
function: fget_many
fio-7066 [009] 1800.209866:
function: __fget_files
fio-7066 [009] 1800.209867:
function: rcu_read_unlock_strict
fio-7066 [009] 1800.209867: function:
io_req_prep
fio-7066 [009] 1800.209867:
function: io_prep_rw
fio-7066 [009] 1800.209867: function:
io_queue_sqe
fio-7066 [009] 1800.209867:
function: io_req_defer
fio-7066 [009] 1800.209867:
function: __io_queue_sqe
fio-7066 [009] 1800.209868:
function: io_issue_sqe
fio-7066 [009] 1800.209868:
function: io_read
fio-7066 [009] 1800.209868:
function: io_import_iovec
fio-7066 [009] 1800.209868:
function: __io_file_supports_async
fio-7066 [009] 1800.209868:
function: I_BDEV
fio-7066 [009] 1800.209868:
function: __kmalloc
fio-7066 [009] 1800.209868:
function: kmalloc_slab
fio-7066 [009] 1800.209868: function: __cond_resched
fio-7066 [009] 1800.209868: function:
rcu_all_qs
fio-7066 [009] 1800.209869: function: should_failslab
fio-7066 [009] 1800.209869:
function: io_req_map_rw
fio-7066 [009] 1800.209869:
function: io_arm_poll_handler
fio-7066 [009] 1800.209869:
function: io_queue_async_work
fio-7066 [009] 1800.209869:
function: io_prep_async_link
fio-7066 [009] 1800.209869:
function: io_prep_async_work
fio-7066 [009] 1800.209870:
function: io_wq_enqueue
fio-7066 [009] 1800.209870:
function: io_wqe_enqueue
fio-7066 [009] 1800.209870:
function: _raw_spin_lock_irqsave
fio-7066 [009] 1800.209870: function:
_raw_spin_unlock_irqrestore
From which I deduce that __io_file_supports_async() (today named
__io_file_supports_nowait) returns false, and therefore every io_uring
operation is bounced to a workqueue with the resulting great loss in
performance.
However, I also see NOWAIT is part of the default set of flags:
#define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
(1 << QUEUE_FLAG_SAME_COMP) | \
(1 << QUEUE_FLAG_NOWAIT))
and I don't see that md touches it (I do see that dm plays with it).
So, what's the story? does md not support NOWAIT? If so, that's a huge
blow to io_uring with md. If it does, are there any clues about why I
see requests bouncing to a workqueue?
That is indeed the story, dm supports it but md doesn't just yet.
Ah, so I missed md clearing the default flags somewhere.
This is a false negative from io_uring's point of view, yes? An md on
nvme would be essentially nowait in normal operation, it just doesn't
know it. aio on the same device would not block on the same workload.
It's
being worked on right now, though:
https://lore.kernel.org/linux-raid/20211101215143.1580-1-vverma@xxxxxxxxxxxxxxxx/
Should be pretty simple, and then we can push to -stable as well.
That's good to know.