在 2024/11/06 17:51, Xiao Ni 写道:
One customer reports a bug: raid5 is hung when changing thread cnt
while resync is running. The stripes are all in conf->handle_list
and new threads can't handle them.
Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
pers->quiesce from mddev_suspend/resume. Before this patch, mddev_suspend
needs to wait for all ios including sync io to finish. Now it's used
to only wait normal io.
In this patch, it calls raid5_quiesce in raid5_store_group_thread_cnt
directly to wait all sync requests to finish before changing the group
cnt.
Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
Signed-off-by: Xiao Ni <xni@xxxxxxxxxx>
---
drivers/md/raid5.c | 4 ++++
1 file changed, 4 insertions(+)
LGTM
Reviewed-by: Yu Kuai <yukuai3@xxxxxxxxxx>
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index dc2ea636d173..2fa1f270fb1d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7177,6 +7177,8 @@ raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len)
err = mddev_suspend_and_lock(mddev);
if (err)
return err;
+ raid5_quiesce(mddev, true);
+
conf = mddev->private;
if (!conf)
err = -ENODEV;
@@ -7198,6 +7200,8 @@ raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len)
kfree(old_groups);
}
}
+
+ raid5_quiesce(mddev, false);
mddev_unlock_and_resume(mddev);
return err ?: len;