Re: [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request()

Jens Axboe <axboe@xxxxxxxxx> · Mon, 11 Sep 2017 15:27:57 -0600

On 09/11/2017 03:13 PM, Mike Snitzer wrote:
> On Mon, Sep 11 2017 at  4:51pm -0400,
> Jens Axboe <axboe@xxxxxxxxx> wrote:
> 
>> On 09/11/2017 10:16 AM, Mike Snitzer wrote:
>>> Here is v2 that should obviate the need to rename blk_mq_insert_request
>>> (by using bools to control run_queue and async).
>>>
>>> As for inserting directly into dispatch, if that can be done that is
>>> great but I'd prefer to have that be a follow-up optimization.  This
>>> fixes the regression in question, and does so in well-known terms.
>>>
>>> What do you think?
>>
>> I think it looks reasonable. My only concern is the use of the software
>> queues. Depending on the scheduler, they may or may not be used. I'd
>> need to review the code, but my first thought is that this would break
>> if you use blk_mq_insert_request() on a device that is managed by
>> mq-deadline or bfq, for instance. Schedulers are free to use the
>> software queues, but they are also free to ignore them and use internal
>> queuing.
>>
>> Looking at the code, looks like this was changed slightly at some point,
>> we always flush the software queues, if any of them contain requests. So
>> it's probably fine.
> 
> OK good, but is that too brittle to rely on? Something that might change
> in the future?

I'm actually surprised we do flush software queues for that case, since
we don't always have to. So it is a bit of a wart. If we don't have a
scheduler, software queues is where IO goes. If we have a scheduler, the
scheduler has complete control of where to queue IO. Generally, the
scheduler will either utilize the software queues or it won't, there's
nothing in between.

I know realize I'm an idiot and didn't read it right. So here's the code
in question:

const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;  

[...]

} else if (!has_sched_dispatch) {                                       
        blk_mq_flush_busy_ctxs(hctx, &rq_list);                         
        blk_mq_dispatch_rq_list(q, &rq_list);                           
}

so we do only enter sw queue flushing, if we don't have a scheduler with
a dispatch_request hook. So now I am really wondering how your patch
could work if the bottom device has bfq or mq-deadline attached?

>> My earlier suggestion to use just hctx->dispatch for the IO and bypass
>> the software queues completely. The use case for the dispatch list is
>> the same, regardless of whether the device has a scheduler attached or
>> not.
> 
> I'm missing how these details relate to the goal of bypassing any
> scheduler that might be attached.  Are you saying the attached elevator
> would still get in the way?

See above.

> Looking at blk_mq_sched_insert_request(), submission when an elevator
> isn't attached is exactly what I made blk_mq_insert_request() do
> (which is exactly what it did in the past).

Right, but that path is only used if we don't have a scheduler attached.
So while the code will use that path IFF a scheduler isn't attached to
that device, your use case will use it for both cases.

> In the case of DM multipath, nothing else should be submitting IO to
> the device so elevator shouldn't be used -- only interface for
> submitting IO would be blk_mq_insert_request().  So even if a
> scheduler is attached it should be bypassed right?

The problem is the usage of the sw queue.

Does the below work for you?

diff --git a/block/blk-core.c b/block/blk-core.c
index d709c0e3a2ac..aebe676225e6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2342,7 +2342,12 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
 	if (q->mq_ops) {
 		if (blk_queue_io_stat(q))
 			blk_account_io_start(rq, true);
-		blk_mq_sched_insert_request(rq, false, true, false, false);
+		/*
+		 * Since we have a scheduler attached on the top device,
+		 * bypass a potential scheduler on the bottom device for
+		 * insert.
+		 */
+		blk_mq_request_bypass_insert(rq);
 		return BLK_STS_OK;
 	}
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3f18cff80050..98a18609755e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1401,6 +1401,22 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	blk_mq_hctx_mark_pending(hctx, ctx);
 }
 
+/*
+ * Should only be used carefully, when the caller knows we want to
+ * bypass a potential IO scheduler on the target device.
+ */
+void blk_mq_request_bypass_insert(struct request *rq)
+{
+	struct blk_mq_ctx *ctx = rq->mq_ctx;
+	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu);
+
+	spin_lock(&hctx->lock);
+	list_add_tail(&rq->queuelist, &hctx->dispatch);
+	spin_unlock(&hctx->lock);
+
+	blk_mq_run_hw_queue(hctx, false);
+}
+
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 			    struct list_head *list)
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 98252b79b80b..ef15b3414da5 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -54,6 +54,7 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
  */
 void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 				bool at_head);
+void blk_mq_request_bypass_insert(struct request *rq);
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 				struct list_head *list);
 

-- 
Jens Axboe