Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Update Guoqing’s email address]

On 15.02.21 12:07, Paul Menzel wrote:
[+cc Donald]

Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
Unregister sync_thread doesn't need to hold reconfig_mutex since it
doesn't reconfigure array.

And it could cause deadlock problem for raid5 as follows:

1. process A tried to reap sync thread with reconfig_mutex held after echo
    idle to sync_action.
2. raid5 sync thread was blocked if there were too many active stripes.
3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
    which causes the number of active stripes can't be decreased.
4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
    to hold reconfig_mutex.

More details in the link:
https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@xxxxxxxxxxxxx/T/#t

And add one parameter to md_reap_sync_thread since it could be called by
dm-raid which doesn't hold reconfig_mutex.

Reported-and-tested-by: Donald Buczek <buczek@xxxxxxxxxxxxx>

Thanks, Paul, for putting me into the cc.

Guoqing, I don't think, I've tested this patch. Please remove the tested-by.

btw: We have the fix I suggested [1] running on 59 production raid6 sets with 16 disk each with various loads and with monthly mdcheck (paused during daytime, so a few transitions each month) on several kernel versions running for nearly a year now. Many more transitions during testing. That doesn't mean the fix is correct, of course. The configurations of our systems are almost identical and we don't do suspend or anything. But maybe you might want to reconsider.

[1]: https://lore.kernel.org/linux-raid/bc342de0-98d2-1733-39cd-cc1999777ff3@xxxxxxxxxxxxx/

If you want me to test V3 of your patch, please put me in the cc.

Best
  Donald

Signed-off-by: Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx>
---
V2:
1. add one parameter to md_reap_sync_thread per Jack's suggestion.

  drivers/md/dm-raid.c |  2 +-
  drivers/md/md.c      | 14 +++++++++-----
  drivers/md/md.h      |  2 +-
  3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index cab12b2..0c4cbba 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
      if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
          if (mddev->sync_thread) {
              set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-            md_reap_sync_thread(mddev);
+            md_reap_sync_thread(mddev, false);
          }
      } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
          return -EBUSY;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index ca40942..0c12b7f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char *page, size_t len)
                  flush_workqueue(md_misc_wq);
              if (mddev->sync_thread) {
                  set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-                md_reap_sync_thread(mddev);
+                md_reap_sync_thread(mddev, true);
              }
              mddev_unlock(mddev);
          }
@@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev)
          flush_workqueue(md_misc_wq);
      if (mddev->sync_thread) {
          set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-        md_reap_sync_thread(mddev);
+        md_reap_sync_thread(mddev, true);
      }
      del_timer_sync(&mddev->safemode_timer);
@@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev)
               * ->spare_active and clear saved_raid_disk
               */
              set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-            md_reap_sync_thread(mddev);
+            md_reap_sync_thread(mddev, true);
              clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
              clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
              clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
@@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev)
              goto unlock;
          }
          if (mddev->sync_thread) {
-            md_reap_sync_thread(mddev);
+            md_reap_sync_thread(mddev, true);
              goto unlock;
          }
          /* Set RUNNING before clearing NEEDED to avoid
@@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev)
  }
  EXPORT_SYMBOL(md_check_recovery);
-void md_reap_sync_thread(struct mddev *mddev)
+void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
  {
      struct md_rdev *rdev;
      sector_t old_dev_sectors = mddev->dev_sectors;
      bool is_reshaped = false;
      /* resync has finished, collect result */
+    if (reconfig_mutex_held)
+        mddev_unlock(mddev);
      md_unregister_thread(&mddev->sync_thread);
+    if (reconfig_mutex_held)
+        mddev_lock_nointr(mddev);
      if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
          !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
          mddev->degraded != mddev->raid_disks) {
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 34070ab..7ae3d98 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -705,7 +705,7 @@ extern struct md_thread *md_register_thread(
  extern void md_unregister_thread(struct md_thread **threadp);
  extern void md_wakeup_thread(struct md_thread *thread);
  extern void md_check_recovery(struct mddev *mddev);
-extern void md_reap_sync_thread(struct mddev *mddev);
+extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held);
  extern int mddev_init_writes_pending(struct mddev *mddev);
  extern bool md_write_start(struct mddev *mddev, struct bio *bi);
  extern void md_write_inc(struct mddev *mddev, struct bio *bi);


--
Donald Buczek
buczek@xxxxxxxxxxxxx
Tel: +49 30 8413 1433



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux