On Sun, Dec 3, 2023 at 7:18 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > From: Yu Kuai <yukuai3@xxxxxxxxxx> > > New mddev_resume() calls are added to synchroniza IO with array > reconfiguration, however, this introduce a regression while adding it in > md_start_sync(): > > 1) someone set MD_RECOVERY_NEEDED first; > 2) daemon thread grab reconfig_mutex, then clear MD_RECOVERY_NEEDED and > queue a new sync work; > 3) daemon thread release reconfig_mutex; > 4) in md_start_sync > a) check that there are spares that can be added/removed, then suspend > the array; > b) remove_and_add_spares may not be called, or called without really > add/remove spares; > c) resume the array, then set MD_RECOVERY_NEEDED again! > > Loop between 2 - 4, then mddev_suspend() will be called quite often, for > consequence, normal IO will be quite slow. > > Fix this problem by spliting MD_RECOVERY_NEEDED out of mddev_resume(), so > that md_start_sync() won't set such flag and hence the loop will be broken. I hope we don't leak set_bit MD_RECOVERY_NEEDED to all call sites of mddev_resume(). How about something like the following instead? Please also incorporate feedback from Paul in the next version. Thanks, Song diff --git i/drivers/md/md.c w/drivers/md/md.c index c94373d64f2c..2d53e1b57070 100644 --- i/drivers/md/md.c +++ w/drivers/md/md.c @@ -490,7 +490,7 @@ int mddev_suspend(struct mddev *mddev, bool interruptible) } EXPORT_SYMBOL_GPL(mddev_suspend); -void mddev_resume(struct mddev *mddev) +static void __mddev_resume(struct mddev *mddev, bool recovery_needed) { lockdep_assert_not_held(&mddev->reconfig_mutex); @@ -507,12 +507,18 @@ void mddev_resume(struct mddev *mddev) percpu_ref_resurrect(&mddev->active_io); wake_up(&mddev->sb_wait); - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); + if (recovery_needed) + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */ mutex_unlock(&mddev->suspend_mutex); } + +void mddev_resume(struct mddev *mddev) +{ + __mddev_resume(mddev, true); +} EXPORT_SYMBOL_GPL(mddev_resume); /* @@ -9403,7 +9409,9 @@ static void md_start_sync(struct work_struct *ws) goto not_running; } - suspend ? mddev_unlock_and_resume(mddev) : mddev_unlock(mddev); + mddev_unlock(mddev); + if (suspend) + __mddev_resume(mddev, false); md_wakeup_thread(mddev->sync_thread); sysfs_notify_dirent_safe(mddev->sysfs_action); md_new_event(); @@ -9415,7 +9423,9 @@ static void md_start_sync(struct work_struct *ws) clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery); - suspend ? mddev_unlock_and_resume(mddev) : mddev_unlock(mddev); + mddev_unlock(mddev); + if (suspend) + __mddev_resume(mddev, false); wake_up(&resync_wait); if (test_and_clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery) &&