On 5/19/24 20:38, Yang Yang wrote:
The depth is 62, and the wake_batch is 8. In the following situation, the task would hang forever. t1: t2: t3: blk_mq_get_tag . . io_schedule . . elevator_switch . blk_mq_freeze_queue . blk_freeze_queue_start . blk_mq_freeze_queue_wait . blk_mq_submit_bio __bio_queue_enter Fix this issue by waking up all the waiters sleeping on tags after freezing the queue.
Shouldn't blk_mq_alloc_request() be mentioned in t1 since that is the function that calls blk_queue_enter()?
diff --git a/block/blk-core.c b/block/blk-core.c index a16b5abdbbf5..e1eacfad6e5b 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -298,8 +298,6 @@ void blk_queue_start_drain(struct request_queue *q) * prevent I/O from crossing blk_queue_enter(). */ blk_freeze_queue_start(q); - if (queue_is_mq(q)) - blk_mq_wake_waiters(q); /* Make blk_queue_enter() reexamine the DYING flag. */ wake_up_all(&q->mq_freeze_wq); }
Why has blk_queue_start_drain() been modified? I don't see any reference in the patch description to blk_queue_start_drain(). Am I perhaps missing something?
diff --git a/block/blk-mq.c b/block/blk-mq.c index 4ecb9db62337..9eb3139e713a 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -125,8 +125,10 @@ void blk_freeze_queue_start(struct request_queue *q) if (++q->mq_freeze_depth == 1) { percpu_ref_kill(&q->q_usage_counter); mutex_unlock(&q->mq_freeze_lock); - if (queue_is_mq(q)) + if (queue_is_mq(q)) { + blk_mq_wake_waiters(q); blk_mq_run_hw_queues(q, false); + } } else { mutex_unlock(&q->mq_freeze_lock); }
Why would the above change be necessary? If the blk_queue_enter() call by blk_mq_alloc_request() succeeds and blk_mq_get_tag() calls io_schedule(), io_schedule() will be woken up indirectly by the blk_mq_run_hw_queues() call because that call will free one of the tags that the io_schedule() call is waiting for. Thanks, Bart.