[PATCH] md/raid1: freeze block layer queue during reshape

Xueshi Hu <xueshi.hu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Sun, 2 Jul 2023 18:04:25 +0800

When a raid device is reshaped, in-flight bio may reference outdated
r1conf::raid_disks and r1bio::poolinfo. This can trigger a bug in
three possible paths:

1. In function "raid1d". If a bio fails to submit, it will be resent to
raid1d for retrying the submission, which increases r1conf::nr_queued.
If the reshape happens, the in-flight bio cannot be freed normally as
the old mempool has been destroyed.
2. In raid1_write_request. If one raw device is blocked, the kernel will
allow the barrier and wait for the raw device became ready, this makes
the raid reshape possible. Then, the local variable "disks" before the
label "retry_write" is outdated. Additionally, the kernel cannot reuse the
old r1bio.
3. In raid_end_bio_io. The kernel must free the r1bio first and then
allow the barrier.

By freezing the queue, we can ensure that there are no in-flight bios
during reshape. This prevents bio from referencing the outdated
r1conf::raid_disks or r1bio::poolinfo.

Signed-off-by: Xueshi Hu <xueshi.hu@xxxxxxxxxx>
---
 drivers/md/raid1.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index dd25832eb045..d8d6825d0af6 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3247,6 +3247,7 @@ static int raid1_reshape(struct mddev *mddev)
 	unsigned long flags;
 	int d, d2;
 	int ret;
+	struct request_queue *q = mddev->queue;
 
 	memset(&newpool, 0, sizeof(newpool));
 	memset(&oldpool, 0, sizeof(oldpool));
@@ -3296,6 +3297,7 @@ static int raid1_reshape(struct mddev *mddev)
 		return -ENOMEM;
 	}
 
+	blk_mq_freeze_queue(q);
 	freeze_array(conf, 0);
 
 	/* ok, everything is stopped */
@@ -3333,6 +3335,7 @@ static int raid1_reshape(struct mddev *mddev)
 	md_wakeup_thread(mddev->thread);
 
 	mempool_exit(&oldpool);
+	blk_mq_unfreeze_queue(q);
 	return 0;
 }
 
-- 
2.40.1