Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx

Jens Axboe <axboe@xxxxxxxxx> · Thu, 21 May 2020 12:45:11 -0600

On 5/21/20 12:39 PM, Thomas Gleixner wrote:
> Ming,
> 
> Ming Lei <ming.lei@xxxxxxxxxx> writes:
>> On Thu, May 21, 2020 at 10:13:59AM +0200, Thomas Gleixner wrote:
>>> Ming Lei <ming.lei@xxxxxxxxxx> writes:
>>>> On Thu, May 21, 2020 at 12:14:18AM +0200, Thomas Gleixner wrote:
>>>> - otherwise, the kthread just retries and retries to allocate & release,
>>>> and sooner or later, its time slice is consumed, and migrated out, and the
>>>> cpu hotplug handler will get chance to run and move on, then the cpu is
>>>> shutdown.
>>>
>>> 1) This is based on the assumption that the kthread is in the SCHED_OTHER
>>>    scheduling class. Is that really a valid assumption?
>>
>> Given it is unlikely path, we can add msleep() before retrying when INACTIVE bit
>> is observed by current thread, and this way can avoid spinning and should work
>> for other schedulers.
> 
> That should work, but pretty is something else
> 
>>>
>>> 2) What happens in the following scenario:
>>>
>>>    unplug
>>>
>>>      mq_offline
>>>        set_ctx_inactive()
>>>        drain_io()
>>>        
>>>    io_kthread()
>>>        try_queue()
>>>        wait_on_ctx()
>>>
>>>    Can this happen and if so what will wake up that thread?
>>
>> drain_io() releases all tag of this hctx, then wait_on_ctx() will be waken up
>> after any tag is released.
> 
> drain_io() is already done ...
> 
> So looking at that thread function:
> 
> static int io_sq_thread(void *data)
> {
> 	struct io_ring_ctx *ctx = data;
> 
>         while (...) {
>               ....
> 	      to_submit = io_sqring_entries(ctx);
> 
> --> preemption
> 
> hotplug runs
>    mq_offline()
>       set_ctx_inactive();
>       drain_io();
>       finished();
> 
> --> thread runs again
> 
>       mutex_lock(&ctx->uring_lock);
>       ret = io_submit_sqes(ctx, to_submit, NULL, -1, true);
>       mutex_unlock(&ctx->uring_lock);
> 
>       ....
> 
>       if (!to_submit || ret == -EBUSY)
>           ...
>       	  wait_on_ctx();
> 
> Can this happen or did drain_io() already take care of the 'to_submit'
> items and the call to io_submit_sqes() turns into a zero action ?
> 
> If the above happens then nothing will wake it up because the context
> draining is done and finished.

Again, this is mixing up io_uring and blk-mq. Maybe it's the fact that
both use 'ctx' that makes this confusing. On the blk-mq side, the 'ctx'
is the per-cpu queue context, for io_uring it's the io_uring instance.

io_sq_thread() doesn't care about any sort of percpu mappings, it's
happy as long as it'll keep running regardless of whether or not the
optional pinned CPU is selected and then offlined.

-- 
Jens Axboe