> > Hi Ming - > > > > There is still some race which is not handled. Take a case of IO is > > not able to get budget and it has already marked <restarts> flag. > > <restarts> flag will be seen non-zero in completion path and > > completion path will attempt h/w queue run. (But this particular IO is > > still not in s/w queue.). > > Attempt of running h/w queue from completion path will not flush any > > IO since there is no IO in s/w queue. > > Then where is the IO to be submitted in case of running out of budget? Typical race in your latest patch is - (Lets consider command A,B and C) Command A did not receive budget. Command B completed (which was already submitted earlier) at the same time and it make sdev->device_busy = 0 from " scsi_finish_command". Command B has still not called "scsi_end_request". Command C get the budget and it will make sdev->device_busy = 1. Now, Command A set set sdev->restarts flags but will not run h/w queue since sdev->device_busy = 1. Command B run h/w queue (make sdev->restart = 0) from completion path, but command -A is still not in the s/w queue. Command-A is in now in s/w queue. Command-C completed but it will not run h/w queue because sdev->restarts = 0. > > Any IO request which is going to be added to hctx->dispatch, the queue will be > re-run via blk-mq core. > > Any IO request being issued directly when running out of budget will be insert > to hctx->dispatch or sw/scheduler queue, will be run in the submission path. I have *not* included below changes we discussed in my testing - If I include below patch, it is correct that queue will be run in submission path (at least the path which is impacted in my testing). You have already mentioned that most of the submission path has fix now in latest kernel w.r.t running h/w queue from submission path. Below path is missing for running h/w queue from submission path. diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 54f9015..bcfd33a 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -173,8 +173,10 @@ static int blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) if (!sbitmap_any_bit_set(&hctx->ctx_map)) break; - if (!blk_mq_get_dispatch_budget(hctx)) + if (!blk_mq_get_dispatch_budget(hctx)) { + blk_mq_delay_run_hw_queue(hctx, + BLK_MQ_BUDGET_DELAY); break; + } rq = blk_mq_dequeue_from_ctx(hctx, ctx); if (!rq) { Are you saying above fix should be included along with your latest patch ? > > > Thanks, > Ming