On Wed, Aug 30, 2017 at 05:11:00PM +0000, Bart Van Assche wrote: > On Sun, 2017-08-27 at 00:33 +0800, Ming Lei wrote: > > During dispatching, we moved all requests from hctx->dispatch to > > one temporary list, then dispatch them one by one from this list. > > Unfortunately duirng this period, run queue from other contexts > ^^^^^^ > during? OK. > > may think the queue is idle, then start to dequeue from sw/scheduler > > queue and still try to dispatch because ->dispatch is empty. This way > > hurts sequential I/O performance because requests are dequeued when > > lld queue is busy. > > [ ... ] > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > > index 735e432294ab..4d7bea8c2594 100644 > > --- a/block/blk-mq-sched.c > > +++ b/block/blk-mq-sched.c > > @@ -146,7 +146,6 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > > struct request_queue *q = hctx->queue; > > struct elevator_queue *e = q->elevator; > > const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request; > > - bool do_sched_dispatch = true; > > LIST_HEAD(rq_list); > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */ > > Shouldn't blk_mq_sched_dispatch_requests() set BLK_MQ_S_DISPATCH_BUSY just after > the following statement because this statement makes the dispatch list empty? Actually that is what I did in V1. I changed to this way because setting the BUSY flag here will increase the race window a bit, for example, if one request is added to ->dispatch just after it is flushed(), the check on the BUSY bit won't catch this case. Then we can avoid to check both the bit and list_empty_careful(&hctx->dispatch) in blk_mq_sched_dispatch_requests(), so code becomes simpler and more readable if we set the flag simply from the beginning. > > list_splice_init(&hctx->dispatch, &rq_list); > > > @@ -177,8 +176,33 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > > */ > > if (!list_empty(&rq_list)) { > > blk_mq_sched_mark_restart_hctx(hctx); > > - do_sched_dispatch = blk_mq_dispatch_rq_list(q, &rq_list); > > - } else if (!has_sched_dispatch && !q->queue_depth) { > > + blk_mq_dispatch_rq_list(q, &rq_list); > > + > > + /* > > + * We may clear DISPATCH_BUSY just after it > > + * is set from another context, the only cost > > + * is that one request is dequeued a bit early, > > + * we can survive that. Given the window is > > + * too small, no need to worry about performance > ^^^ > The word "too" seems extraneous to me in this sentence. Maybe 'extremely' is better, or just remove it? > > > bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio, > > @@ -330,6 +353,7 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx, > > */ > > spin_lock(&hctx->lock); > > list_add(&rq->queuelist, &hctx->dispatch); > > + set_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state); > > spin_unlock(&hctx->lock); > > return true; > > } > > Is it necessary to make blk_mq_sched_bypass_insert() set BLK_MQ_S_DISPATCH_BUSY? > My understanding is that only code that makes the dispatch list empty should > set BLK_MQ_S_DISPATCH_BUSY. However, blk_mq_sched_bypass_insert() adds an element > to the dispatch list so that guarantees that that list is not empty. I believe my above comment has explained it already. > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index f063dd0f197f..6af56a71c1cd 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -1140,6 +1140,11 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) > > > > spin_lock(&hctx->lock); > > list_splice_init(list, &hctx->dispatch); > > + /* > > + * DISPATCH_BUSY won't be cleared until all requests > > + * in hctx->dispatch are dispatched successfully > > + */ > > + set_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state); > > spin_unlock(&hctx->lock); > > Same comment here - since this code adds one or more requests to the dispatch list, > is it really needed to set the DISPATCH_BUSY flag? See same comment above, :-) -- Ming