This is a note to let you know that I've just added the patch titled md: Don't suspend the array for interrupted reshape to the 6.7-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: md-don-t-suspend-the-array-for-interrupted-reshape.patch and it can be found in the queue-6.7 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From 9e46c70e829bddc24e04f963471e9983a11598b7 Mon Sep 17 00:00:00 2001 From: Yu Kuai <yukuai3@xxxxxxxxxx> Date: Thu, 1 Feb 2024 17:25:50 +0800 Subject: md: Don't suspend the array for interrupted reshape From: Yu Kuai <yukuai3@xxxxxxxxxx> commit 9e46c70e829bddc24e04f963471e9983a11598b7 upstream. md_start_sync() will suspend the array if there are spares that can be added or removed from conf, however, if reshape is still in progress, this won't happen at all or data will be corrupted(remove_and_add_spares won't be called from md_choose_sync_action for reshape), hence there is no need to suspend the array if reshape is not done yet. Meanwhile, there is a potential deadlock for raid456: 1) reshape is interrupted; 2) set one of the disk WantReplacement, and add a new disk to the array, however, recovery won't start until the reshape is finished; 3) then issue an IO across reshpae position, this IO will wait for reshape to make progress; 4) continue to reshape, then md_start_sync() found there is a spare disk that can be added to conf, mddev_suspend() is called; Step 4 and step 3 is waiting for each other, deadlock triggered. Noted this problem is found by code review, and it's not reporduced yet. Fix this porblem by don't suspend the array for interrupted reshape, this is safe because conf won't be changed until reshape is done. Fixes: bc08041b32ab ("md: suspend array in md_start_sync() if array need reconfiguration") Cc: stable@xxxxxxxxxxxxxxx # v6.7+ Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> Signed-off-by: Song Liu <song@xxxxxxxxxx> Link: https://lore.kernel.org/r/20240201092559.910982-6-yukuai1@xxxxxxxxxxxxxxx Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- drivers/md/md.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9424,12 +9424,17 @@ static void md_start_sync(struct work_st bool suspend = false; char *name; - if (md_spares_need_change(mddev)) + /* + * If reshape is still in progress, spares won't be added or removed + * from conf until reshape is done. + */ + if (mddev->reshape_position == MaxSector && + md_spares_need_change(mddev)) { suspend = true; + mddev_suspend(mddev, false); + } - suspend ? mddev_suspend_and_lock_nointr(mddev) : - mddev_lock_nointr(mddev); - + mddev_lock_nointr(mddev); if (!md_is_rdwr(mddev)) { /* * On a read-only array we can: Patches currently in stable-queue which might be from yukuai3@xxxxxxxxxx are queue-6.7/md-fix-missing-release-of-active_io-for-flush.patch queue-6.7/md-don-t-register-sync_thread-for-reshape-directly.patch queue-6.7/md-make-sure-md_do_sync-will-set-md_recovery_done.patch queue-6.7/md-don-t-suspend-the-array-for-interrupted-reshape.patch queue-6.7/md-don-t-ignore-suspended-array-in-md_check_recovery.patch queue-6.7/md-don-t-ignore-read-only-array-in-md_check_recovery.patch