On Tue, Jan 30, 2018 at 5:22 AM, Bart Van Assche <Bart.VanAssche@xxxxxxx> wrote: > On Mon, 2018-01-29 at 15:33 -0500, Mike Snitzer wrote: >> + * If driver returns BLK_STS_RESOURCE and SCHED_RESTART >> + * bit is set, run queue after 10ms to avoid IO stalls >> + * that could otherwise occur if the queue is idle. >> */ >> - if (!blk_mq_sched_needs_restart(hctx) || >> + needs_restart = blk_mq_sched_needs_restart(hctx); >> + if (!needs_restart || >> (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) >> blk_mq_run_hw_queue(hctx, true); >> + else if (needs_restart && (ret == BLK_STS_RESOURCE)) >> + blk_mq_delay_run_hw_queue(hctx, BLK_MQ_QUEUE_DELAY); >> } > > The above code only calls blk_mq_delay_run_hw_queue() if the following condition > is met: !(!needs_restart || (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) > && (needs_restart && (ret == BLK_STS_RESOURCE)). That boolean expression can be > simplified into the following: needs_restart && ret == BLK_STS_RESOURCE && > !(no_tag && list_empty_careful(&hctx->dispatch_wait.entry)). From blk-mq-sched.h: No, the expression of (needs_restart && (ret == BLK_STS_RESOURCE)) clearly expresses what we want to do. When RESTART is working, and meantime BLK_STS_RESOURCE is returned from driver, we need to avoid IO hang by blk_mq_delay_run_hw_queue(). > > static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx) > { > return test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); > } > > In other words, the above code will not call blk_mq_delay_run_hw_queue() if > BLK_MQ_S_SCHED_RESTART is cleared after it got set and before the above code is > reached. The BLK_MQ_S_SCHED_RESTART bit gets cleared if a request completes > concurrently with the above code. From blk-mq-sched.c: That won't be an issue, given any time the SCHED_RESTART is cleared, the queue will be run again, so we needn't to worry about that. > > static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx) > { > if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) > return false; > > if (hctx->flags & BLK_MQ_F_TAG_SHARED) { > struct request_queue *q = hctx->queue; > > if (test_and_clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) > atomic_dec(&q->shared_hctx_restart); > } else > clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); > > return blk_mq_run_hw_queue(hctx, true); > } > > The above blk_mq_run_hw_queue() call may finish either before or after If the above blk_mq_run_hw_queue() finishes before blk_mq_dispatch_rq_list() examines the flag, blk_mq_dispatch_rq_list() will see the flag cleared, and run queue again by the following code branch: if (!blk_mq_sched_needs_restart(hctx) || (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) blk_mq_run_hw_queue(hctx, true); If blk_mq_run_hw_queue() finishes after blk_mq_dispatch_rq_list() examines the flag, this blk_mq_run_hw_queue() will see the new added request, and still everything is fine. Even blk_mq_delay_run_hw_queue() can be started too. > blk_mq_dispatch_rq_list() examines the BLK_MQ_S_SCHED_RESTART flag. > > That's why I wrote in previous e-mails that this patch and also the previous > versions of this patch change a systematic call of blk_mq_delay_run_hw_queue() > into a call that may or may not happen, depending on whether or not a request > completes concurrently with request queueing. Sorry but I think that means > that the above change combined with changing BLK_STS_RESOURCE into > BLK_STS_DEV_RESOURCE is wrong and why I expect that this will result in > sporadic queue stalls. As you know sporadic queue stalls are annoying and hard > to debug. Now there is debugfs, it isn't difficult to deal with that if reporter'd like to cowork. > > Plenty of e-mails have already been exchanged about versions one to four of this > patch but so far nobody has even tried to contradict the above. No, I don't see the issue, let's revisit the main change again: + needs_restart = blk_mq_sched_needs_restart(hctx); + if (!needs_restart || (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) blk_mq_run_hw_queue(hctx, true); + else if (needs_restart && (ret == BLK_STS_RESOURCE)) + blk_mq_delay_run_hw_queue(hctx, 10); If SCHED_RESTART isn't set, queue is run immediately, otherwise when BLK_STS_RESOURCE is returned, we avoid IO hang by blk_mq_delay_run_hw_queue(hctx, XXX). And we don't cover BLK_STS_DEV_RESOURCE above because it is documented clearly that BLK_STS_DEV_RESOURCE is only returned by driver iff queue will be rerun in future if resource is available. Is there any hole in above code? Thanks, Ming Lei