Re: [RFC PATCH] blk-mq: fix potential I/O hang caused by batch wakeup

Bart Van Assche <bvanassche@xxxxxxx> · Mon, 20 May 2024 11:11:52 -0700

On 5/19/24 20:38, Yang Yang wrote:
The depth is 62, and the wake_batch is 8. In the following situation,
the task would hang forever.

   t1:                 t2:                          t3:
   blk_mq_get_tag      .                            .
   io_schedule         .                            .
                       elevator_switch              .
                       blk_mq_freeze_queue          .
                       blk_freeze_queue_start       .
                       blk_mq_freeze_queue_wait     .
                                                    blk_mq_submit_bio
                                                    __bio_queue_enter

Fix this issue by waking up all the waiters sleeping on tags after
freezing the queue.

Shouldn't blk_mq_alloc_request() be mentioned in t1 since that is the function
that calls blk_queue_enter()?

diff --git a/block/blk-core.c b/block/blk-core.c
index a16b5abdbbf5..e1eacfad6e5b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -298,8 +298,6 @@ void blk_queue_start_drain(struct request_queue *q)
  	 * prevent I/O from crossing blk_queue_enter().
  	 */
  	blk_freeze_queue_start(q);
-	if (queue_is_mq(q))
-		blk_mq_wake_waiters(q);
  	/* Make blk_queue_enter() reexamine the DYING flag. */
  	wake_up_all(&q->mq_freeze_wq);
  }

Why has blk_queue_start_drain() been modified? I don't see any reference
in the patch description to blk_queue_start_drain(). Am I perhaps missing
something?

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4ecb9db62337..9eb3139e713a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -125,8 +125,10 @@ void blk_freeze_queue_start(struct request_queue *q)
  	if (++q->mq_freeze_depth == 1) {
  		percpu_ref_kill(&q->q_usage_counter);
  		mutex_unlock(&q->mq_freeze_lock);
-		if (queue_is_mq(q))
+		if (queue_is_mq(q)) {
+			blk_mq_wake_waiters(q);
  			blk_mq_run_hw_queues(q, false);
+		}
  	} else {
  		mutex_unlock(&q->mq_freeze_lock);
  	}

Why would the above change be necessary? If the blk_queue_enter() call
by blk_mq_alloc_request() succeeds and blk_mq_get_tag() calls
io_schedule(), io_schedule() will be woken up indirectly by the
blk_mq_run_hw_queues() call because that call will free one of the tags
that the io_schedule() call is waiting for.

Thanks,

Bart.