On Tue, Sep 25, 2018 at 11:06:54AM +0800, Guoqing Jiang wrote: > If we change the number of array's device after device > is removed from array, then add the device back to array, > we can see that device is added as active role instead of > spare which we expected. > > Please see the below link for details: > > https://marc.info/?l=linux-raid&m=153736982015076&w=2 > > This is caused by that we prefer to use device's previous > role which is recorded by saved_raid_disk, but we should > respect the new number of conf->raid_disks since it could > be changed after device is removed. > > Fixes: 70bcecdb1534 (md-cluster: Improve md_reload_sb to be less error prone") > Reported-by: Gioh Kim <gi-oh.kim@xxxxxxxxxxxxxxxx> > Tested-by: Gioh Kim <gi-oh.kim@xxxxxxxxxxxxxxxx> > Signed-off-by: Guoqing Jiang <gqjiang@xxxxxxxx> raid10 has similar code, I think we should fix it in generic code. does below code work for you? diff --git a/drivers/md/md.c b/drivers/md/md.c index 63ceabb4e020..a25ebf81b266 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -1774,6 +1774,10 @@ static int super_1_validate(struct mddev *mddev, struct md_rdev *rdev) } else set_bit(In_sync, &rdev->flags); rdev->raid_disk = role; + if (role >= mddev->raid_disks) { + rdev->saved_raid_disk = -1; + rdev->raid_disk = -1; + } break; } if (sb->devflags & WriteMostly1) > --- > drivers/md/raid1.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 4e990246225e..1d54109071cc 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -1734,6 +1734,7 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev) > */ > if (rdev->saved_raid_disk >= 0 && > rdev->saved_raid_disk >= first && > + rdev->saved_raid_disk < conf->raid_disks && > conf->mirrors[rdev->saved_raid_disk].rdev == NULL) > first = last = rdev->saved_raid_disk; > > -- > 2.13.7 >