On Mon, 2018-01-29 at 15:33 -0500, Mike Snitzer wrote: > + * If driver returns BLK_STS_RESOURCE and SCHED_RESTART > + * bit is set, run queue after 10ms to avoid IO stalls > + * that could otherwise occur if the queue is idle. > */ > - if (!blk_mq_sched_needs_restart(hctx) || > + needs_restart = blk_mq_sched_needs_restart(hctx); > + if (!needs_restart || > (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) > blk_mq_run_hw_queue(hctx, true); > + else if (needs_restart && (ret == BLK_STS_RESOURCE)) > + blk_mq_delay_run_hw_queue(hctx, BLK_MQ_QUEUE_DELAY); > } The above code only calls blk_mq_delay_run_hw_queue() if the following condition is met: !(!needs_restart || (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) && (needs_restart && (ret == BLK_STS_RESOURCE)). That boolean expression can be simplified into the following: needs_restart && ret == BLK_STS_RESOURCE && !(no_tag && list_empty_careful(&hctx->dispatch_wait.entry)). From blk-mq-sched.h: static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx) { return test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); } In other words, the above code will not call blk_mq_delay_run_hw_queue() if BLK_MQ_S_SCHED_RESTART is cleared after it got set and before the above code is reached. The BLK_MQ_S_SCHED_RESTART bit gets cleared if a request completes concurrently with the above code. From blk-mq-sched.c: static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx) { if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) return false; if (hctx->flags & BLK_MQ_F_TAG_SHARED) { struct request_queue *q = hctx->queue; if (test_and_clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) atomic_dec(&q->shared_hctx_restart); } else clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); return blk_mq_run_hw_queue(hctx, true); } The above blk_mq_run_hw_queue() call may finish either before or after blk_mq_dispatch_rq_list() examines the BLK_MQ_S_SCHED_RESTART flag. That's why I wrote in previous e-mails that this patch and also the previous versions of this patch change a systematic call of blk_mq_delay_run_hw_queue() into a call that may or may not happen, depending on whether or not a request completes concurrently with request queueing. Sorry but I think that means that the above change combined with changing BLK_STS_RESOURCE into BLK_STS_DEV_RESOURCE is wrong and why I expect that this will result in sporadic queue stalls. As you know sporadic queue stalls are annoying and hard to debug. Plenty of e-mails have already been exchanged about versions one to four of this patch but so far nobody has even tried to contradict the above. Bart.