On 11/15/21 1:05 AM, Avi Kivity wrote: > On 11/14/21 20:23, Jens Axboe wrote: >> On 11/14/21 10:07 AM, Avi Kivity wrote: >>> Running a trivial randread, direct=1 fio workload against a RAID-0 >>> composed of some nvme devices, I see this pattern: >>> >>> >>> fio-7066 [009] 1800.209865: function: io_submit_sqes >>> fio-7066 [009] 1800.209866: function: >>> rcu_read_unlock_strict >>> fio-7066 [009] 1800.209866: function: >>> io_submit_sqe >>> fio-7066 [009] 1800.209866: function: >>> io_init_req >>> fio-7066 [009] 1800.209866: >>> function: io_file_get >>> fio-7066 [009] 1800.209866: >>> function: fget_many >>> fio-7066 [009] 1800.209866: >>> function: __fget_files >>> fio-7066 [009] 1800.209867: >>> function: rcu_read_unlock_strict >>> fio-7066 [009] 1800.209867: function: >>> io_req_prep >>> fio-7066 [009] 1800.209867: >>> function: io_prep_rw >>> fio-7066 [009] 1800.209867: function: >>> io_queue_sqe >>> fio-7066 [009] 1800.209867: >>> function: io_req_defer >>> fio-7066 [009] 1800.209867: >>> function: __io_queue_sqe >>> fio-7066 [009] 1800.209868: >>> function: io_issue_sqe >>> fio-7066 [009] 1800.209868: >>> function: io_read >>> fio-7066 [009] 1800.209868: >>> function: io_import_iovec >>> fio-7066 [009] 1800.209868: >>> function: __io_file_supports_async >>> fio-7066 [009] 1800.209868: >>> function: I_BDEV >>> fio-7066 [009] 1800.209868: >>> function: __kmalloc >>> fio-7066 [009] 1800.209868: >>> function: kmalloc_slab >>> fio-7066 [009] 1800.209868: function: __cond_resched >>> fio-7066 [009] 1800.209868: function: >>> rcu_all_qs >>> fio-7066 [009] 1800.209869: function: should_failslab >>> fio-7066 [009] 1800.209869: >>> function: io_req_map_rw >>> fio-7066 [009] 1800.209869: >>> function: io_arm_poll_handler >>> fio-7066 [009] 1800.209869: >>> function: io_queue_async_work >>> fio-7066 [009] 1800.209869: >>> function: io_prep_async_link >>> fio-7066 [009] 1800.209869: >>> function: io_prep_async_work >>> fio-7066 [009] 1800.209870: >>> function: io_wq_enqueue >>> fio-7066 [009] 1800.209870: >>> function: io_wqe_enqueue >>> fio-7066 [009] 1800.209870: >>> function: _raw_spin_lock_irqsave >>> fio-7066 [009] 1800.209870: function: >>> _raw_spin_unlock_irqrestore >>> >>> >>> >>> From which I deduce that __io_file_supports_async() (today named >>> __io_file_supports_nowait) returns false, and therefore every io_uring >>> operation is bounced to a workqueue with the resulting great loss in >>> performance. >>> >>> >>> However, I also see NOWAIT is part of the default set of flags: >>> >>> >>> #define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \ >>> (1 << QUEUE_FLAG_SAME_COMP) | \ >>> (1 << QUEUE_FLAG_NOWAIT)) >>> >>> and I don't see that md touches it (I do see that dm plays with it). >>> >>> >>> So, what's the story? does md not support NOWAIT? If so, that's a huge >>> blow to io_uring with md. If it does, are there any clues about why I >>> see requests bouncing to a workqueue? >> That is indeed the story, dm supports it but md doesn't just yet. > > > Ah, so I missed md clearing the default flags somewhere. > > > This is a false negative from io_uring's point of view, yes? An md on > nvme would be essentially nowait in normal operation, it just doesn't > know it. aio on the same device would not block on the same workload. There are still conditions where it can block, it just didn't in your test case. -- Jens Axboe