Re: [PATCH V2 2/2] Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2023/11/08 5:24, Junxiao Bi 写道:
This reverts commit 5e2cf333b7bd5d3e62595a44d598a254c697cd74.

That commit introduced the following race and can cause system hung.

  md_write_start:                                                                        raid5d:
  if (mddev->safemode == 1)
      mddev->safemode = 0;
  /* sync_checkers is always 0 when writes_pending is in per-cpu mode */
  if (mddev->in_sync || mddev->sync_checkers) {
      spin_lock(&mddev->lock);
      if (mddev->in_sync) {
          mddev->in_sync = 0;
          set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags);
          set_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
                                                                                         >>> running before md_write_start wake up it
                                                                                         if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING)) {
                                                                                               spin_unlock_irq(&conf->device_lock);
                                                                                               md_check_recovery(mddev);
                                                                                               spin_lock_irq(&conf->device_lock);

                                                                                               /*
                                                                                                * Waiting on MD_SB_CHANGE_PENDING below may deadlock
                                                                                                * seeing md_check_recovery() is needed to clear
                                                                                                * the flag when using mdmon.
                                                                                                */
                                                                                               continue;
                                                                                          }

                                                                                          wait_event_lock_irq(mddev->sb_wait,     >>>>>>>>>>> hung
                                                                                              !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags),
                                                                                              conf->device_lock);
          md_wakeup_thread(mddev->thread);
          did_change = 1;
      }
      spin_unlock(&mddev->lock);
  }


Forgot to mention this, above extrem long lines really is not good,
please fix it. I think following should be enough:

raid5d			md_write_start
 if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING))
 // failed
			 set_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)
			  md_wakeup_thread
 wait_event_lock_irq
 /*
  * wait for MD_SB_CHANGE_PENDING to be cleared, while
  * md_write_start expect daemon thread to clear it.
  */

Thanks,
Kuai		
  ...

  wait_event(mddev->sb_wait,    >>>>>>>>>> hung
     !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) ||
     mddev->suspended);

The issue that reverted commit is fixing is fixed by last patch in a new way.

Fixes: 5e2cf333b7bd ("md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d")
Signed-off-by: Junxiao Bi <junxiao.bi@xxxxxxxxxx>
---
  drivers/md/raid5.c | 12 ------------
  1 file changed, 12 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index dc031d42f53b..fcc8a44dd4fd 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -36,7 +36,6 @@
   */
#include <linux/blkdev.h>
-#include <linux/delay.h>
  #include <linux/kthread.h>
  #include <linux/raid/pq.h>
  #include <linux/async_tx.h>
@@ -6820,18 +6819,7 @@ static void raid5d(struct md_thread *thread)
  			spin_unlock_irq(&conf->device_lock);
  			md_check_recovery(mddev);
  			spin_lock_irq(&conf->device_lock);
-
-			/*
-			 * Waiting on MD_SB_CHANGE_PENDING below may deadlock
-			 * seeing md_check_recovery() is needed to clear
-			 * the flag when using mdmon.
-			 */
-			continue;
  		}
-
-		wait_event_lock_irq(mddev->sb_wait,
-			!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags),
-			conf->device_lock);
  	}
  	pr_debug("%d stripes handled\n", handled);




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux