On Thu, May 8, 2008 at 9:40 PM, Neil Brown <neilb@xxxxxxx> wrote: > > On Thursday May 8, snitzer@xxxxxxxxx wrote: > > On Thu, May 8, 2008 at 2:13 AM, Neil Brown <neilb@xxxxxxx> wrote: > > > On Tuesday May 6, snitzer@xxxxxxxxx wrote: > > > > > > > > It looks like bitmap_update_sb()'s incrementing of events_cleared (on > > > > behalf of the local member) could be racing with the fact that the NBD > > > > member becomes faulty (whereby making the array degraded). This > > > > allows the events_cleared to reflect a clean->dirty transition last > > > > occurred before the array became degraded. My reasoning is: If it was > > > > a clean->dirty transition the bitmap still has the associated dirty > > > > bit set in the local member's bitmap, so using the bitmap to resync is > > > > valid. > > > > > > > > thanks, > > > > Mike > > > > > > Thanks for persisting. I think I understand what is going on now. > > > > > > How about this patch? It is similar to your, but instead of depending > > > on the odd/even state of the event counter, it directly checks the > > > clean/dirty state of the array. > > > > Hi Neil, > > > > Your revised patch works great and is obviously cleaner. > > But I'm still not happy with it :-( > I suspect there might be other cases where it will still do the wrong > thing. > The real problem is that we are updating events_cleared to early. We > are setting to the new event counter before that is even written out. > > So I've come up with this patch, which I think more clearly > encapsulated what events_cleared means. It is now set to the current > 'events' counter immediately before we clear any bit. > > If you could test it, I'd really appreciate it. Unfortunately my testing with this patch results in a full resync. Here is the state of the array after shutdown: # mdadm -X /dev/nbd0 /dev/sdq Filename : /dev/nbd0 Magic : 6d746962 Version : 4 UUID : 7140cc3c:8681416c:12c5668a:984ca55d Events : 896 Events Cleared : 897 State : OK Chunksize : 128 KB Daemon : 5s flush period Write Mode : Normal Sync Size : 52428736 (50.00 GiB 53.69 GB) Bitmap : 409600 bits (chunks), 1 dirty (0.0%) Filename : /dev/sdq Magic : 6d746962 Version : 4 UUID : 7140cc3c:8681416c:12c5668a:984ca55d Events : 898 Events Cleared : 897 State : OK Chunksize : 128 KB Daemon : 5s flush period Write Mode : Normal Sync Size : 52428736 (50.00 GiB 53.69 GB) Bitmap : 409600 bits (chunks), 0 dirty (0.0%) # mdadm --examine /dev/nbd0 /dev/sdq /dev/nbd0: Magic : a92b4efc Version : 00.90.00 UUID : 7140cc3c:8681416c:12c5668a:984ca55d Creation Time : Thu May 8 06:55:32 2008 Raid Level : raid1 Used Dev Size : 52428736 (50.00 GiB 53.69 GB) Array Size : 52428736 (50.00 GiB 53.69 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Thu May 8 18:07:47 2008 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : df65cb35 - correct Events : 0.896 Number Major Minor RaidDevice State this 1 43 0 1 active sync write-mostly /dev/nbd0 0 0 65 0 0 active sync /dev/sdq 1 1 43 0 1 active sync write-mostly /dev/nbd0 /dev/sdq: Magic : a92b4efc Version : 00.90.00 UUID : 7140cc3c:8681416c:12c5668a:984ca55d Creation Time : Thu May 8 06:55:32 2008 Raid Level : raid1 Used Dev Size : 52428736 (50.00 GiB 53.69 GB) Array Size : 52428736 (50.00 GiB 53.69 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Thu May 8 18:07:49 2008 State : clean Internal Bitmap : present Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 Checksum : df65c956 - correct Events : 0.898 Number Major Minor RaidDevice State this 0 65 0 0 active sync /dev/sdq 0 0 65 0 0 active sync /dev/sdq 1 1 0 0 1 faulty removed Was I supposed to use this latest patch in combination with your previous patch (to validate_super)? Because you'll note that with your most recent patch nbd0's events (ev1) is still one less than sdq's events_cleared. As such the validate_super's "ev1 < mddev->bitmap->events_cleared" check triggers a full rebuild. The kernel log shows: md: md0 stopped. md: bind<nbd0> md: bind<sdq> md: kicking non-fresh nbd0 from array! md: unbind<nbd0> md: export_rdev(nbd0) raid1: raid set md0 active with 1 out of 2 mirrors md0: bitmap initialized from disk: read 13/13 pages, set 0 bits, status: 0 created bitmap (200 pages) for device md0 Nope!!! ev1 (896) < mddev->bitmap->events_cleared (897) md: bind<nbd0> RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:0, o:1, dev:sdq disk 1, wo:1, o:1, dev:nbd0 md: recovery of RAID array md0 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html