On 11/10/2017 03:28 PM, Omar Sandoval wrote: > On Fri, Nov 10, 2017 at 09:12:18AM -0700, Jens Axboe wrote: >> Currently we are inconsistent in when we decide to run the queue. Using >> blk_mq_run_hw_queues() we check if the hctx has pending IO before >> running it, but we don't do that from the individual queue run function, >> blk_mq_run_hw_queue(). This results in a lot of extra and pointless >> queue runs, potentially, on flush requests and (much worse) on tag >> starvation situations. This is observable just looking at the top >> output, with lots of kworkers active. For the !async runs, it just adds >> to the CPU overhead of blk-mq. >> >> Move the has-pending check into the run function instead of having >> callers do it. >> >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >> >> --- >> >> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c >> index 6f4bdb8209f7..c117bd8fd1f6 100644 >> --- a/block/blk-mq-sched.c >> +++ b/block/blk-mq-sched.c >> @@ -81,12 +81,7 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx) >> } else >> clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); >> >> - if (blk_mq_hctx_has_pending(hctx)) { >> - blk_mq_run_hw_queue(hctx, true); >> - return true; >> - } >> - >> - return false; >> + return blk_mq_run_hw_queue(hctx, true); >> } >> >> /* >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index bfe24a5b62a3..a2a4271f5ab8 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -61,10 +61,10 @@ static int blk_mq_poll_stats_bkt(const struct request *rq) >> /* >> * Check if any of the ctx's have pending work in this hardware queue >> */ >> -bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx) >> +static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx) >> { >> - return sbitmap_any_bit_set(&hctx->ctx_map) || >> - !list_empty_careful(&hctx->dispatch) || >> + return !list_empty_careful(&hctx->dispatch) || >> + sbitmap_any_bit_set(&hctx->ctx_map) || >> blk_mq_sched_has_work(hctx); >> } >> >> @@ -1253,9 +1253,14 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs) >> } >> EXPORT_SYMBOL(blk_mq_delay_run_hw_queue); >> >> -void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) >> +bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) >> { >> - __blk_mq_delay_run_hw_queue(hctx, async, 0); >> + if (blk_mq_hctx_has_pending(hctx)) { >> + __blk_mq_delay_run_hw_queue(hctx, async, 0); >> + return true; >> + } >> + >> + return false; >> } >> EXPORT_SYMBOL(blk_mq_run_hw_queue); >> >> @@ -1265,8 +1270,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async) >> int i; >> >> queue_for_each_hw_ctx(q, hctx, i) { >> - if (!blk_mq_hctx_has_pending(hctx) || >> - blk_mq_hctx_stopped(hctx)) >> + if (blk_mq_hctx_stopped(hctx)) >> continue; >> >> blk_mq_run_hw_queue(hctx, async); > > Minor cosmetic point, I find this double-negative thing confusing, > how about: > > queue_for_each_hw_ctx(q, hctx, i) { > if (!blk_mq_hctx_stopped(hctx)) > blk_mq_run_hw_queue(hctx, async); Really? It's an easier read for me - if stopped, continue. Then run after that. !stopped seems like more of a double negative :-) > Other than that, > > Reviewed-by: Omar Sandoval <osandov@xxxxxx> Thanks for the review! -- Jens Axboe