On Thu, 7 Jun 2012 15:47:46 +0300 Alexander Lyakas <alex.bolshoy@xxxxxxxxx> wrote: > Thanks for commenting, Neil, > > On Thu, Jun 7, 2012 at 2:24 PM, NeilBrown <neilb@xxxxxxx> wrote: > > On Thu, 7 Jun 2012 10:22:24 +0300 Alexander Lyakas <alex.bolshoy@xxxxxxxxx> > > wrote: > > > >> Hi again Neil, and Andrey, > >> > >> looking at this email thread: > >> http://www.spinics.net/lists/raid/msg36236.html > >> between you and Andrey, the conclusion was: > >> "So the correct thing to do is to *not* update the metadata on the > >> recovering device until recovery completes. Then if it fails and is > >> re-added, it will look just the same as when it was re-added the first > >> time, and will do a bitmap-based recovery." > >> > >> I have two doubts about this decision: > >> 1) Since the event count on the recovering drive is not updated, this > >> means that after reboot, when array is re-assembled, this drive will > >> not be added to the array, and the user will have to manually re-add > >> it. I agree this is a minor thing. > > > > Still, if it can be fixed it should be. > > > >> 2) There are places in mdadm, which check for recovery_offset on the > >> drive and take decisions based upon that. Specifically, if there is > >> *no* recovery offset, the data on this drive is considered to be > >> consistent WRT to the failure time of the drive. So, for example, the > >> drive can be a candidate for bumping up events during "force > >> assembly". Now, when superblock on such drive is not updated during > >> recovery (so there is *no* recovery offset), mdadm will think that the > >> drive is consistent, while in fact, its data is totally unusable until > >> after recovery completes. That's because we have updated parts of the > >> drive, but did not complete bringing the whole drive in-sync. > > > > If mdadm would consider updating the event count if not recovery had started, > > then surely it is just as valid to do so once some recovery has started, even > > if it hasn't completed. > > The patch you accepted from me ("Don't consider disks with a valid > recovery offset as candidates for bumping up event count") actually > attempts to protect from that:) > > I don't understand why "it is just as valid to do so once some > recovery has started". My understanding is that once recovery of a > drive has started, its data is not consistent between different parts > of the drive, until the recovery completes. This is because md does > bitmap-based recovery, and not kind of journal/transaction-log based > recovery. > > However, one could argue that for force-assembly case, when data > anyways can come up as partially corrupted, this is less important. Exactly. And mdadm only updates event counts in the force-assembly case so while it might not be ideal, it is the best we can do. > > I would still think that there is value in recoding in a superblock > that a drive is recovering. Probably. It is a bit unfortunate that if you stop an array that is recovering after a --re-add, you cannot simply 'assemble' it again and get it back to the same state. I'll think more on that. Meanwhile, this patch might address your other problem. It allows --re-add to work if a non-bitmap rebuild fails and is then re-added. diff --git a/drivers/md/md.c b/drivers/md/md.c index c601c4b..d31852e 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -5784,7 +5784,7 @@ static int add_new_disk(struct mddev * mddev, mdu_disk_info_t *info) super_types[mddev->major_version]. validate_super(mddev, rdev); if ((info->state & (1<<MD_DISK_SYNC)) && - (!test_bit(In_sync, &rdev->flags) || + (test_bit(Faulty, &rdev->flags) || rdev->raid_disk != info->raid_disk)) { /* This was a hot-add request, but events doesn't * match, so reject it. > > > > >> > >> Can you pls comment on my doubts? > > > > I think there is probably room for improvement here but I don't think there > > are any serious problems. > > > > However I'm about to go on leave for a couple of week so I'm unlikely to > > think about it for a while. I've made a note to look at it properly when I > > get back. > > > > Indeed, don't you think about this while you are resting! :-) Thanks. I did have a very enjoyable break. NeilBrown
Attachment:
signature.asc
Description: PGP signature