Re: [PATCH RFC 5.13 2/2] io_uring: submit sqes in the original context when waking up sqthread

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/29/21 5:37 AM, Hao Xu wrote:
> 在 2021/4/28 下午10:37, Pavel Begunkov 写道:
>> On 4/28/21 3:34 PM, Pavel Begunkov wrote:
>>> On 4/28/21 2:32 PM, Hao Xu wrote:
>>>> sqes are submitted by sqthread when it is leveraged, which means there
>>>> is IO latency when waking up sqthread. To wipe it out, submit limited
>>>> number of sqes in the original task context.
>>>> Tests result below:
>>>
>>> Frankly, it can be a nest of corner cases if not now then in the future,
>>> leading to a high maintenance burden. Hence, if we consider the change,
>>> I'd rather want to limit the userspace exposure, so it can be removed
>>> if needed.
>>>
>>> A noticeable change of behaviour here, as Hao recently asked, is that
>>> the ring can be passed to a task from a completely another thread group,
>>> and so the feature would execute from that context, not from the
>>> original/sqpoll one.
>>
>> So maybe something like:
>> if (same_thread_group()) {
>>     /* submit */
>> }I thought this case(cross independent processes) for some time, Pavel,
> could you give more hints about how this may trigger errors?

Currently? We need to audit cancellation, but don't think it's a problem.

But as said, it's about the future. Your patch adds a new quirky
userspace behaviour (submitting from alien context as described), and
once commited it can't be removed and should be maintained. 

I can easily imagine it either over-complicating cancellations (and
we had enough of troubles with it), or just preventing more important
optimisations, or anything else often happening with new features.


>>
>>>
>>> Not sure IORING_ENTER_SQ_DEPUTY knob is needed, but at least can be
>>> ignored if the previous point is addressed.
>>
>> I'd question whether it'd be better with the flag or without doing
>> this feature by default.
> Just like what Jens said, the flag here is to allow users to do their
> decision, there may be cases like a application wants to offload as much
> as possible IO related work to sqpoll, so that it can be dedicated to
> computation work etc.
>>
>>>
>>>>
>>>> 99th latency:
>>>> iops\idle    10us    60us    110us    160us    210us    260us    310us    360us    410us    460us    510us
>>>> with this patch:
>>>> 2k          13    13    12    13    13    12    12    11    11    10.304    11.84
>>>> without this patch:
>>>> 2k          15    14    15    15    15    14    15    14    14    13    11.84
>>>
>>> Not sure the second nine describes it well enough, please can you
>>> add more data? Mean latency, 50%, 90%, 99%, 99.9%, t-put.
> Sure, I will.

Forgot but it's important, should compared with non-sqpoll as well
because the feature is taking the middle ground between them.

>>>
>>> Btw, how happened that only some of the numbers have fractional part?
>>> Can't believe they all but 3 were close enough to integer values.
> This confused me a little bit too, but it is indeed what fio outputs.

That's just always when I see such, something tells me that data has
been manipulated. Even if it's fio, it's really weird and suspicious, 
and worth to look what's wrong with it.

>>>
>>>> fio config:
>>>> ./run_fio.sh
>>>> fio \
>>>> --ioengine=io_uring --sqthread_poll=1 --hipri=1 --thread=1 --bs=4k \
>>>> --direct=1 --rw=randread --time_based=1 --runtime=300 \
>>>> --group_reporting=1 --filename=/dev/nvme1n1 --sqthread_poll_cpu=30 \
>>>> --randrepeat=0 --cpus_allowed=35 --iodepth=128 --rate_iops=${1} \
>>>> --io_sq_thread_idle=${2}
>>>>
>>>> Signed-off-by: Hao Xu <haoxu@xxxxxxxxxxxxxxxxx>
>>>> ---
>>>>   fs/io_uring.c                 | 29 +++++++++++++++++++++++------
>>>>   include/uapi/linux/io_uring.h |  1 +
>>>>   2 files changed, 24 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 1871fad48412..f0a01232671e 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -1252,7 +1252,12 @@ static void io_queue_async_work(struct io_kiocb *req)
>>>>   {
>>>>       struct io_ring_ctx *ctx = req->ctx;
>>>>       struct io_kiocb *link = io_prep_linked_timeout(req);
>>>> -    struct io_uring_task *tctx = req->task->io_uring;
>>>> +    struct io_uring_task *tctx = NULL;
>>>> +
>>>> +    if (ctx->sq_data && ctx->sq_data->thread)
>>>> +        tctx = ctx->sq_data->thread->io_uring;
>>>
>>> without park it's racy, sq_data->thread may become NULL and removed,
>>> as well as its ->io_uring.
>>>
>>>> +    else
>>>> +        tctx = req->task->io_uring;
>>>>         BUG_ON(!tctx);
>>>>       BUG_ON(!tctx->io_wq);
>>>
>>> [snip]
>>>
>>
> 

-- 
Pavel Begunkov



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux