Re: raid5 reshape is stuck when raid5 journal device miss

Xiao Ni <xni@xxxxxxxxxx> · Thu, 30 Aug 2018 10:01:07 +0800

On 08/30/2018 02:02 AM, Shaohua Li wrote:
On Fri, Aug 24, 2018 at 03:53:59AM -0400, Xiao Ni wrote:
Hi all

The reshape can be stuck during raid5 reshape when raid5 journal misses. It can
be reproduced 100%

The test steps are:
1. mdadm -CR /dev/md0 -l5 -n4 /dev/sd[b-e]1 --write-journal /dev/sdf1
2. mdadm --wait /dev/md0
3. mdadm /dev/md0 -f /dev/sdf1
4. mdadm /dev/md0 -r /dev/sdf1
5. mdadm /dev/md0 -a /dev/sdf1
6. mdadm -G -n5 /dev/md0

Reshape request has 4 steps:
1. read data for source stripes
2. write source strips data to target stripes
3. calculate parity for target stripes
4. write target stripes to disks.

After step3:
sh->reconstruct_state is reconstruct_state_result
sh->state is STRIPE_EXPANDING | STRIPE_EXPAND_READY

Now it needs to write data to disks. And it needs to execute this part code:

         /* Finish reconstruct operations initiated by the expansion process */
         if (sh->reconstruct_state == reconstruct_state_result) {

But the journal disk is removed, it execute this part code:

         if (s.failed > conf->max_degraded ||
             (s.log_failed && s.injournal == 0)) {
                 sh->check_state = 0;
                 sh->reconstruct_state = 0;

After setting sh->reconstruct_state to zero, it will go to calculate the parity again.
Now it's stuck in a dead loop.

Can we allow the reshape happen in this case? Is it ok just to return failure for command
`mdadm -G -n5 /dev/md0` in this case?
We actually don't support reshape with log enabled yet. How about this one:

diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index a001808a2b77..bfb811407061 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -46,6 +46,11 @@ extern int ppl_modify_log(struct r5conf *conf, struct md_rdev *rdev, bool add);
  extern void ppl_quiesce(struct r5conf *conf, int quiesce);
  extern int ppl_handle_flush_request(struct r5l_log *log, struct bio *bio);
  
+static inline bool raid5_has_log(struct r5conf *conf)
+{
+	return test_bit(MD_HAS_JOURNAL, &conf->mddev->flags);
+}
+
  static inline bool raid5_has_ppl(struct r5conf *conf)
  {
  	return test_bit(MD_HAS_PPL, &conf->mddev->flags);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4ce0d7502fad..e4e98f47865d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -733,7 +733,7 @@ static bool stripe_can_batch(struct stripe_head *sh)
  {
  	struct r5conf *conf = sh->raid_conf;
  
-	if (conf->log || raid5_has_ppl(conf))
+	if (raid5_has_log(conf) || raid5_has_ppl(conf))
  		return false;
  	return test_bit(STRIPE_BATCH_READY, &sh->state) &&
  		!test_bit(STRIPE_BITMAP_PENDING, &sh->state) &&
@@ -7737,7 +7737,7 @@ static int raid5_resize(struct mddev *mddev, sector_t sectors)
  	sector_t newsize;
  	struct r5conf *conf = mddev->private;
  
-	if (conf->log || raid5_has_ppl(conf))
+	if (raid5_has_log(conf) || raid5_has_ppl(conf))
  		return -EINVAL;
  	sectors &= ~((sector_t)conf->chunk_sectors - 1);
  	newsize = raid5_size(mddev, sectors, mddev->raid_disks);
@@ -7788,7 +7788,7 @@ static int check_reshape(struct mddev *mddev)
  {
  	struct r5conf *conf = mddev->private;
  
-	if (conf->log || raid5_has_ppl(conf))
+	if (raid5_has_log(conf) || raid5_has_ppl(conf))
  		return -EINVAL;
  	if (mddev->delta_disks == 0 &&
  	    mddev->new_layout == mddev->layout &&

Hi Shaohua

The patch can fix this problem. Thanks for your time.

Best Regards
Xiao