Re: [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request()

Ming Lei <tom.leiming@xxxxxxxxx> · Fri, 15 Sep 2017 00:33:36 +0800

On Fri, Sep 15, 2017 at 12:30 AM, Jens Axboe <axboe@xxxxxxxxx> wrote:
> On 09/14/2017 09:57 AM, Ming Lei wrote:
>> On Tue, Sep 12, 2017 at 5:27 AM, Jens Axboe <axboe@xxxxxxxxx> wrote:
>>> On 09/11/2017 03:13 PM, Mike Snitzer wrote:
>>>> On Mon, Sep 11 2017 at  4:51pm -0400,
>>>> Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>>
>>>>> On 09/11/2017 10:16 AM, Mike Snitzer wrote:
>>>>>> Here is v2 that should obviate the need to rename blk_mq_insert_request
>>>>>> (by using bools to control run_queue and async).
>>>>>>
>>>>>> As for inserting directly into dispatch, if that can be done that is
>>>>>> great but I'd prefer to have that be a follow-up optimization.  This
>>>>>> fixes the regression in question, and does so in well-known terms.
>>>>>>
>>>>>> What do you think?
>>>>>
>>>>> I think it looks reasonable. My only concern is the use of the software
>>>>> queues. Depending on the scheduler, they may or may not be used. I'd
>>>>> need to review the code, but my first thought is that this would break
>>>>> if you use blk_mq_insert_request() on a device that is managed by
>>>>> mq-deadline or bfq, for instance. Schedulers are free to use the
>>>>> software queues, but they are also free to ignore them and use internal
>>>>> queuing.
>>>>>
>>>>> Looking at the code, looks like this was changed slightly at some point,
>>>>> we always flush the software queues, if any of them contain requests. So
>>>>> it's probably fine.
>>>>
>>>> OK good, but is that too brittle to rely on? Something that might change
>>>> in the future?
>>>
>>> I'm actually surprised we do flush software queues for that case, since
>>> we don't always have to. So it is a bit of a wart. If we don't have a
>>> scheduler, software queues is where IO goes. If we have a scheduler, the
>>> scheduler has complete control of where to queue IO. Generally, the
>>> scheduler will either utilize the software queues or it won't, there's
>>> nothing in between.
>>>
>>> I know realize I'm an idiot and didn't read it right. So here's the code
>>> in question:
>>>
>>> const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
>>>
>>> [...]
>>>
>>> } else if (!has_sched_dispatch) {
>>>         blk_mq_flush_busy_ctxs(hctx, &rq_list);
>>>         blk_mq_dispatch_rq_list(q, &rq_list);
>>> }
>>>
>>> so we do only enter sw queue flushing, if we don't have a scheduler with
>>> a dispatch_request hook. So now I am really wondering how your patch
>>> could work if the bottom device has bfq or mq-deadline attached?
>>>
>>>>> My earlier suggestion to use just hctx->dispatch for the IO and bypass
>>>>> the software queues completely. The use case for the dispatch list is
>>>>> the same, regardless of whether the device has a scheduler attached or
>>>>> not.
>>>>
>>>> I'm missing how these details relate to the goal of bypassing any
>>>> scheduler that might be attached.  Are you saying the attached elevator
>>>> would still get in the way?
>>>
>>> See above.
>>>
>>>> Looking at blk_mq_sched_insert_request(), submission when an elevator
>>>> isn't attached is exactly what I made blk_mq_insert_request() do
>>>> (which is exactly what it did in the past).
>>>
>>> Right, but that path is only used if we don't have a scheduler attached.
>>> So while the code will use that path IFF a scheduler isn't attached to
>>> that device, your use case will use it for both cases.
>>>
>>>> In the case of DM multipath, nothing else should be submitting IO to
>>>> the device so elevator shouldn't be used -- only interface for
>>>> submitting IO would be blk_mq_insert_request().  So even if a
>>>> scheduler is attached it should be bypassed right?
>>>
>>> The problem is the usage of the sw queue.
>>>
>>> Does the below work for you?
>>>
>>>
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index d709c0e3a2ac..aebe676225e6 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -2342,7 +2342,12 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
>>>         if (q->mq_ops) {
>>>                 if (blk_queue_io_stat(q))
>>>                         blk_account_io_start(rq, true);
>>> -               blk_mq_sched_insert_request(rq, false, true, false, false);
>>> +               /*
>>> +                * Since we have a scheduler attached on the top device,
>>> +                * bypass a potential scheduler on the bottom device for
>>> +                * insert.
>>> +                */
>>> +               blk_mq_request_bypass_insert(rq);
>>>                 return BLK_STS_OK;
>>>         }
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 3f18cff80050..98a18609755e 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -1401,6 +1401,22 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>         blk_mq_hctx_mark_pending(hctx, ctx);
>>>  }
>>>
>>> +/*
>>> + * Should only be used carefully, when the caller knows we want to
>>> + * bypass a potential IO scheduler on the target device.
>>> + */
>>> +void blk_mq_request_bypass_insert(struct request *rq)
>>> +{
>>> +       struct blk_mq_ctx *ctx = rq->mq_ctx;
>>> +       struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu);
>>> +
>>> +       spin_lock(&hctx->lock);
>>> +       list_add_tail(&rq->queuelist, &hctx->dispatch);
>>> +       spin_unlock(&hctx->lock);
>>> +
>>> +       blk_mq_run_hw_queue(hctx, false);
>>> +}
>>> +
>>
>> Hello Jens and Mike,
>>
>> This patch sends flush request to ->dispatch directly too, which changes the
>> previous behaviour, is that OK for dm-rq?
>
> That's a good question, I need to look into that. The flush behavior is so
> annoying. Did you make any progress on fixing up the patch you posted the
> other day on treating flushes like any other request?

It has been ready, will post it out later.

-- 
Ming Lei