Hi Neil, >> >> I would still think that there is value in recoding in a superblock >> that a drive is recovering. > > Probably. It is a bit unfortunate that if you stop an array that is > recovering after a --re-add, you cannot simply 'assemble' it again and > get it back to the same state. > I'll think more on that. As I mentioned, I see the additional re-add as a minor thing, but agree it's better to fix it. The fact that we don't know that the drive is being recovered, bothers me more. Because user might look at the superblock, and assume the data on the drive is consistent to some point in time (time of the drive failure). While the actual data, while doing bitmap-based recovery, is unusable until recovery successfully completes. So the user might think it's okay to try to run his app on this drive. Yes, please think about this. > > Meanwhile, this patch might address your other problem. It allows --re-add > to work if a non-bitmap rebuild fails and is then re-added. > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index c601c4b..d31852e 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -5784,7 +5784,7 @@ static int add_new_disk(struct mddev * mddev, mdu_disk_info_t *info) > super_types[mddev->major_version]. > validate_super(mddev, rdev); > if ((info->state & (1<<MD_DISK_SYNC)) && > - (!test_bit(In_sync, &rdev->flags) || > + (test_bit(Faulty, &rdev->flags) || > rdev->raid_disk != info->raid_disk)) { > /* This was a hot-add request, but events doesn't > * match, so reject it. > I have tested a slightly different patch that you suggested earlier - just removing the !test_bit(In_sync, &rdev->flags) check. I confirm that it solves the problem. The Faulty bit check seems redundant to me, because: - it can be set by only by validate_super() and only if that drive's role is 0xfffe in sb->roles[] array - Long time ago I asked you, how can it happen that a drive thinks about *itself* that it is Faulty (has 0xfffe for its role in its own superblock), and you said this should never happen. Anyways, I tested also the patch you suggested, and it also works. Is there any chance to see this fix in ubuntu-precise? Thanks again for your support, Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html