Re: [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request()

Ming Lei <tom.leiming@xxxxxxxxx> · Thu, 14 Sep 2017 23:57:19 +0800

On Tue, Sep 12, 2017 at 5:27 AM, Jens Axboe <axboe@xxxxxxxxx> wrote:
> On 09/11/2017 03:13 PM, Mike Snitzer wrote:
>> On Mon, Sep 11 2017 at  4:51pm -0400,
>> Jens Axboe <axboe@xxxxxxxxx> wrote:
>>
>>> On 09/11/2017 10:16 AM, Mike Snitzer wrote:
>>>> Here is v2 that should obviate the need to rename blk_mq_insert_request
>>>> (by using bools to control run_queue and async).
>>>>
>>>> As for inserting directly into dispatch, if that can be done that is
>>>> great but I'd prefer to have that be a follow-up optimization.  This
>>>> fixes the regression in question, and does so in well-known terms.
>>>>
>>>> What do you think?
>>>
>>> I think it looks reasonable. My only concern is the use of the software
>>> queues. Depending on the scheduler, they may or may not be used. I'd
>>> need to review the code, but my first thought is that this would break
>>> if you use blk_mq_insert_request() on a device that is managed by
>>> mq-deadline or bfq, for instance. Schedulers are free to use the
>>> software queues, but they are also free to ignore them and use internal
>>> queuing.
>>>
>>> Looking at the code, looks like this was changed slightly at some point,
>>> we always flush the software queues, if any of them contain requests. So
>>> it's probably fine.
>>
>> OK good, but is that too brittle to rely on? Something that might change
>> in the future?
>
> I'm actually surprised we do flush software queues for that case, since
> we don't always have to. So it is a bit of a wart. If we don't have a
> scheduler, software queues is where IO goes. If we have a scheduler, the
> scheduler has complete control of where to queue IO. Generally, the
> scheduler will either utilize the software queues or it won't, there's
> nothing in between.
>
> I know realize I'm an idiot and didn't read it right. So here's the code
> in question:
>
> const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
>
> [...]
>
> } else if (!has_sched_dispatch) {
>         blk_mq_flush_busy_ctxs(hctx, &rq_list);
>         blk_mq_dispatch_rq_list(q, &rq_list);
> }
>
> so we do only enter sw queue flushing, if we don't have a scheduler with
> a dispatch_request hook. So now I am really wondering how your patch
> could work if the bottom device has bfq or mq-deadline attached?
>
>>> My earlier suggestion to use just hctx->dispatch for the IO and bypass
>>> the software queues completely. The use case for the dispatch list is
>>> the same, regardless of whether the device has a scheduler attached or
>>> not.
>>
>> I'm missing how these details relate to the goal of bypassing any
>> scheduler that might be attached.  Are you saying the attached elevator
>> would still get in the way?
>
> See above.
>
>> Looking at blk_mq_sched_insert_request(), submission when an elevator
>> isn't attached is exactly what I made blk_mq_insert_request() do
>> (which is exactly what it did in the past).
>
> Right, but that path is only used if we don't have a scheduler attached.
> So while the code will use that path IFF a scheduler isn't attached to
> that device, your use case will use it for both cases.
>
>> In the case of DM multipath, nothing else should be submitting IO to
>> the device so elevator shouldn't be used -- only interface for
>> submitting IO would be blk_mq_insert_request().  So even if a
>> scheduler is attached it should be bypassed right?
>
> The problem is the usage of the sw queue.
>
> Does the below work for you?
>
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index d709c0e3a2ac..aebe676225e6 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2342,7 +2342,12 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
>         if (q->mq_ops) {
>                 if (blk_queue_io_stat(q))
>                         blk_account_io_start(rq, true);
> -               blk_mq_sched_insert_request(rq, false, true, false, false);
> +               /*
> +                * Since we have a scheduler attached on the top device,
> +                * bypass a potential scheduler on the bottom device for
> +                * insert.
> +                */
> +               blk_mq_request_bypass_insert(rq);
>                 return BLK_STS_OK;
>         }
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 3f18cff80050..98a18609755e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1401,6 +1401,22 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>         blk_mq_hctx_mark_pending(hctx, ctx);
>  }
>
> +/*
> + * Should only be used carefully, when the caller knows we want to
> + * bypass a potential IO scheduler on the target device.
> + */
> +void blk_mq_request_bypass_insert(struct request *rq)
> +{
> +       struct blk_mq_ctx *ctx = rq->mq_ctx;
> +       struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu);
> +
> +       spin_lock(&hctx->lock);
> +       list_add_tail(&rq->queuelist, &hctx->dispatch);
> +       spin_unlock(&hctx->lock);
> +
> +       blk_mq_run_hw_queue(hctx, false);
> +}
> +

Hello Jens and Mike,

This patch sends flush request to ->dispatch directly too, which changes the
previous behaviour, is that OK for dm-rq?

-- 
Ming Lei