Hi Neil, I've been looking into another scenario where a raid1 with members that have an internal bitmap are performing what seems to be an unnecessary 'fullsync' on re-add. I'm using 2.6.22.19 + 918f02383fb9ff5dba29709f3199189eeac55021 To be clear this isn't a pathological bug with the generic sequence I'm about to describe; it has more to do with my setup where one of the raid1 members is write-mostly via NBD. The case that I'm trying to resolve is when the remote nbd-server is racing to shutdown _before_ MD has been able to stop the raid1 (while the array is still clean). Therefore the nbd-client loses it's connection and the nbd0 member becomes faulty. So the raid1 marks the remote nbd member faulty and degrades the array just before the raid1 is stopped. When the raid1 is reassembled the previously "faulty" member is deemed "non-fresh" and is kicked from the array (via super_90_validate's -EINVAL return). This "non-fresh" member is then hot-added to the raid1 and in raid1_add_disk() 'fullsync' is almost always set (because 'saved_raid_disk' is -1). I added a some "DEBUG:" logging and the log looks like this: end_request: I/O error, dev nbd0, sector 6297352 md: super_written gets error=-5, uptodate=0 raid1: Disk failure on nbd0, disabling device. Operation continuing on 1 devices RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:0, o:1, dev:sdd1 disk 1, wo:1, o:0, dev:nbd0 RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:0, o:1, dev:sdd1 ... md: md0 stopped. md: bind<nbd0> md: bind<sdd1> md: DEBUG: nbd0 is non-fresh because 'bad' event counter md: kicking non-fresh nbd0 from array! md: unbind<nbd0> md: export_rdev(nbd0) raid1: raid set md0 active with 1 out of 2 mirrors md0: bitmap initialized from disk: read 13/13 pages, set 1 bits, status: 0 created bitmap (193 pages) for device md0 md: DEBUG: nbd0 rdev's ev1 (30186) < mddev->bitmap->events_cleared (30187)... rdev->raid_disk=-1 md: DEBUG: nbd0 saved_raid_disk=-1 md: bind<nbd0> md: DEBUG: nbd0 recovery requires full-resync because rdev->saved_raid_disk < 0 RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:0, o:1, dev:sdd1 disk 1, wo:1, o:1, dev:nbd0 Given validate_super() determines the nbd0's events to be less than the raid1's bitmap's events_cleared it is easy to see why 'saved_raid_disk' is -1 on entry to raid1_add_disk(). For me, this events vs events_cleared mismatch is a regular occurance. The healthy member's bitmap's events_cleared is frequently one greater than the the faulty member's events (and events_cleared). Why is it so detremental for the "faulty" (or in my case "non-fresh") member to have it's events be less than the array's bitmap's events_cleared? Is there possibly a bug with how events_cleared is being incremented (when the raid1 is degraded right before being stopped)? Doesn't an odd valued events simply mean the array is dirty? In fact I've seen the events decrement back one when transitioning from 'dirty' to 'clean', e.g.: [root@srv1 ~]# mdadm -X /dev/sdd1 /dev/nbd0 Filename : /dev/sdd1 Events : 881 Events Cleared : 881 ... Filename : /dev/nbd0 Events : 881 Events Cleared : 881 then seconds later: [root@srv2 ~]# mdadm -X /dev/sdd1 /dev/nbd0 Filename : /dev/sdd1 Events : 880 Events Cleared : 880 ... Filename : /dev/nbd0 Events : 880 Events Cleared : 880 Would the following attached "fix" be invalid (super_1_validate would need patching too)? Any help would be appreciated, thanks. Mike
diff --git a/drivers/md/md.c b/drivers/md/md.c index 827824a..454eb38 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -840,6 +840,7 @@ static int super_90_validate(mddev_t *mddev, mdk_rdev_t *rdev) /* if adding to array with a bitmap, then we can accept an * older device ... but not too old. */ + ++ev1; if (ev1 < mddev->bitmap->events_cleared) return 0; } else {