On 09/14/2017 10:33 AM, Ming Lei wrote: > On Fri, Sep 15, 2017 at 12:30 AM, Jens Axboe <axboe@xxxxxxxxx> wrote: >> On 09/14/2017 09:57 AM, Ming Lei wrote: >>> On Tue, Sep 12, 2017 at 5:27 AM, Jens Axboe <axboe@xxxxxxxxx> wrote: >>>> On 09/11/2017 03:13 PM, Mike Snitzer wrote: >>>>> On Mon, Sep 11 2017 at 4:51pm -0400, >>>>> Jens Axboe <axboe@xxxxxxxxx> wrote: >>>>> >>>>>> On 09/11/2017 10:16 AM, Mike Snitzer wrote: >>>>>>> Here is v2 that should obviate the need to rename blk_mq_insert_request >>>>>>> (by using bools to control run_queue and async). >>>>>>> >>>>>>> As for inserting directly into dispatch, if that can be done that is >>>>>>> great but I'd prefer to have that be a follow-up optimization. This >>>>>>> fixes the regression in question, and does so in well-known terms. >>>>>>> >>>>>>> What do you think? >>>>>> >>>>>> I think it looks reasonable. My only concern is the use of the software >>>>>> queues. Depending on the scheduler, they may or may not be used. I'd >>>>>> need to review the code, but my first thought is that this would break >>>>>> if you use blk_mq_insert_request() on a device that is managed by >>>>>> mq-deadline or bfq, for instance. Schedulers are free to use the >>>>>> software queues, but they are also free to ignore them and use internal >>>>>> queuing. >>>>>> >>>>>> Looking at the code, looks like this was changed slightly at some point, >>>>>> we always flush the software queues, if any of them contain requests. So >>>>>> it's probably fine. >>>>> >>>>> OK good, but is that too brittle to rely on? Something that might change >>>>> in the future? >>>> >>>> I'm actually surprised we do flush software queues for that case, since >>>> we don't always have to. So it is a bit of a wart. If we don't have a >>>> scheduler, software queues is where IO goes. If we have a scheduler, the >>>> scheduler has complete control of where to queue IO. Generally, the >>>> scheduler will either utilize the software queues or it won't, there's >>>> nothing in between. >>>> >>>> I know realize I'm an idiot and didn't read it right. So here's the code >>>> in question: >>>> >>>> const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request; >>>> >>>> [...] >>>> >>>> } else if (!has_sched_dispatch) { >>>> blk_mq_flush_busy_ctxs(hctx, &rq_list); >>>> blk_mq_dispatch_rq_list(q, &rq_list); >>>> } >>>> >>>> so we do only enter sw queue flushing, if we don't have a scheduler with >>>> a dispatch_request hook. So now I am really wondering how your patch >>>> could work if the bottom device has bfq or mq-deadline attached? >>>> >>>>>> My earlier suggestion to use just hctx->dispatch for the IO and bypass >>>>>> the software queues completely. The use case for the dispatch list is >>>>>> the same, regardless of whether the device has a scheduler attached or >>>>>> not. >>>>> >>>>> I'm missing how these details relate to the goal of bypassing any >>>>> scheduler that might be attached. Are you saying the attached elevator >>>>> would still get in the way? >>>> >>>> See above. >>>> >>>>> Looking at blk_mq_sched_insert_request(), submission when an elevator >>>>> isn't attached is exactly what I made blk_mq_insert_request() do >>>>> (which is exactly what it did in the past). >>>> >>>> Right, but that path is only used if we don't have a scheduler attached. >>>> So while the code will use that path IFF a scheduler isn't attached to >>>> that device, your use case will use it for both cases. >>>> >>>>> In the case of DM multipath, nothing else should be submitting IO to >>>>> the device so elevator shouldn't be used -- only interface for >>>>> submitting IO would be blk_mq_insert_request(). So even if a >>>>> scheduler is attached it should be bypassed right? >>>> >>>> The problem is the usage of the sw queue. >>>> >>>> Does the below work for you? >>>> >>>> >>>> diff --git a/block/blk-core.c b/block/blk-core.c >>>> index d709c0e3a2ac..aebe676225e6 100644 >>>> --- a/block/blk-core.c >>>> +++ b/block/blk-core.c >>>> @@ -2342,7 +2342,12 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request * >>>> if (q->mq_ops) { >>>> if (blk_queue_io_stat(q)) >>>> blk_account_io_start(rq, true); >>>> - blk_mq_sched_insert_request(rq, false, true, false, false); >>>> + /* >>>> + * Since we have a scheduler attached on the top device, >>>> + * bypass a potential scheduler on the bottom device for >>>> + * insert. >>>> + */ >>>> + blk_mq_request_bypass_insert(rq); >>>> return BLK_STS_OK; >>>> } >>>> >>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>> index 3f18cff80050..98a18609755e 100644 >>>> --- a/block/blk-mq.c >>>> +++ b/block/blk-mq.c >>>> @@ -1401,6 +1401,22 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, >>>> blk_mq_hctx_mark_pending(hctx, ctx); >>>> } >>>> >>>> +/* >>>> + * Should only be used carefully, when the caller knows we want to >>>> + * bypass a potential IO scheduler on the target device. >>>> + */ >>>> +void blk_mq_request_bypass_insert(struct request *rq) >>>> +{ >>>> + struct blk_mq_ctx *ctx = rq->mq_ctx; >>>> + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu); >>>> + >>>> + spin_lock(&hctx->lock); >>>> + list_add_tail(&rq->queuelist, &hctx->dispatch); >>>> + spin_unlock(&hctx->lock); >>>> + >>>> + blk_mq_run_hw_queue(hctx, false); >>>> +} >>>> + >>> >>> Hello Jens and Mike, >>> >>> This patch sends flush request to ->dispatch directly too, which changes the >>> previous behaviour, is that OK for dm-rq? >> >> That's a good question, I need to look into that. The flush behavior is so >> annoying. Did you make any progress on fixing up the patch you posted the >> other day on treating flushes like any other request? > > It has been ready, will post it out later. OK good, if that's clean enough, then I think going that route is a much better idea than introducing more flush/not-flush logic. I liked the initial patch from a concept point of view, and it enables us to get rid of a few nasty hacks. -- Jens Axboe