On Fri, Aug 24, 2018 at 03:53:59AM -0400, Xiao Ni wrote: > Hi all > > The reshape can be stuck during raid5 reshape when raid5 journal misses. It can > be reproduced 100% > > The test steps are: > 1. mdadm -CR /dev/md0 -l5 -n4 /dev/sd[b-e]1 --write-journal /dev/sdf1 > 2. mdadm --wait /dev/md0 > 3. mdadm /dev/md0 -f /dev/sdf1 > 4. mdadm /dev/md0 -r /dev/sdf1 > 5. mdadm /dev/md0 -a /dev/sdf1 > 6. mdadm -G -n5 /dev/md0 > > Reshape request has 4 steps: > 1. read data for source stripes > 2. write source strips data to target stripes > 3. calculate parity for target stripes > 4. write target stripes to disks. > > After step3: > sh->reconstruct_state is reconstruct_state_result > sh->state is STRIPE_EXPANDING | STRIPE_EXPAND_READY > > Now it needs to write data to disks. And it needs to execute this part code: > > /* Finish reconstruct operations initiated by the expansion process */ > if (sh->reconstruct_state == reconstruct_state_result) { > > But the journal disk is removed, it execute this part code: > > if (s.failed > conf->max_degraded || > (s.log_failed && s.injournal == 0)) { > sh->check_state = 0; > sh->reconstruct_state = 0; > > After setting sh->reconstruct_state to zero, it will go to calculate the parity again. > Now it's stuck in a dead loop. > > Can we allow the reshape happen in this case? Is it ok just to return failure for command > `mdadm -G -n5 /dev/md0` in this case? We actually don't support reshape with log enabled yet. How about this one: diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h index a001808a2b77..bfb811407061 100644 --- a/drivers/md/raid5-log.h +++ b/drivers/md/raid5-log.h @@ -46,6 +46,11 @@ extern int ppl_modify_log(struct r5conf *conf, struct md_rdev *rdev, bool add); extern void ppl_quiesce(struct r5conf *conf, int quiesce); extern int ppl_handle_flush_request(struct r5l_log *log, struct bio *bio); +static inline bool raid5_has_log(struct r5conf *conf) +{ + return test_bit(MD_HAS_JOURNAL, &conf->mddev->flags); +} + static inline bool raid5_has_ppl(struct r5conf *conf) { return test_bit(MD_HAS_PPL, &conf->mddev->flags); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 4ce0d7502fad..e4e98f47865d 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -733,7 +733,7 @@ static bool stripe_can_batch(struct stripe_head *sh) { struct r5conf *conf = sh->raid_conf; - if (conf->log || raid5_has_ppl(conf)) + if (raid5_has_log(conf) || raid5_has_ppl(conf)) return false; return test_bit(STRIPE_BATCH_READY, &sh->state) && !test_bit(STRIPE_BITMAP_PENDING, &sh->state) && @@ -7737,7 +7737,7 @@ static int raid5_resize(struct mddev *mddev, sector_t sectors) sector_t newsize; struct r5conf *conf = mddev->private; - if (conf->log || raid5_has_ppl(conf)) + if (raid5_has_log(conf) || raid5_has_ppl(conf)) return -EINVAL; sectors &= ~((sector_t)conf->chunk_sectors - 1); newsize = raid5_size(mddev, sectors, mddev->raid_disks); @@ -7788,7 +7788,7 @@ static int check_reshape(struct mddev *mddev) { struct r5conf *conf = mddev->private; - if (conf->log || raid5_has_ppl(conf)) + if (raid5_has_log(conf) || raid5_has_ppl(conf)) return -EINVAL; if (mddev->delta_disks == 0 && mddev->new_layout == mddev->layout &&