On Fri, May 16, 2008 at 7:54 AM, Neil Brown <neilb@xxxxxxx> wrote: > On Friday May 9, snitzer@xxxxxxxxx wrote: >> On Fri, May 9, 2008 at 2:01 AM, Neil Brown <neilb@xxxxxxx> wrote: >> > >> > On Friday May 9, snitzer@xxxxxxxxx wrote: >> >> > > Unfortunately my testing with this patch results in a full resync. ... >> > diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c >> > --- .prev/drivers/md/bitmap.c 2008-05-09 11:02:13.000000000 +1000 >> > +++ ./drivers/md/bitmap.c 2008-05-09 16:00:07.000000000 +1000 >> > >> > @@ -465,8 +465,6 @@ void bitmap_update_sb(struct bitmap *bit >> > spin_unlock_irqrestore(&bitmap->lock, flags); >> > sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); >> > sb->events = cpu_to_le64(bitmap->mddev->events); >> > - if (!bitmap->mddev->degraded) >> > - sb->events_cleared = cpu_to_le64(bitmap->mddev->events); >> >> Before, events_cleared was _not_ updated if the array was degraded. >> Your patch doesn't appear to maintain that design. > > It does, but it is well hidden. > Bits in the bitmap are only cleared when the array is not degraded. > The new code for updating events_cleared is only triggered when a bit > is about to be cleared. Hi Neil, Sorry about not getting back with you sooner. Thanks for putting significant time to chasing this problem. I tested your most recent patch and unfortunately still hit the case where the nbd member becomes degraded yet the array continues to clear bits (events_cleared of the non-degraded member is higher than the degraded member). Is this behavior somehow expected/correct? This was the state of the array after the nbd0 member became degraded and the array was stopped: # mdadm -X /dev/nbd0 /dev/sdq Filename : /dev/nbd0 Magic : 6d746962 Version : 4 UUID : 7140cc3c:8681416c:12c5668a:984ca55d Events : 2642 Events Cleared : 2642 State : OK Chunksize : 128 KB Daemon : 5s flush period Write Mode : Normal Sync Size : 52428736 (50.00 GiB 53.69 GB) Bitmap : 409600 bits (chunks), 1 dirty (0.0%) Filename : /dev/sdq Magic : 6d746962 Version : 4 UUID : 7140cc3c:8681416c:12c5668a:984ca55d Events : 2646 Events Cleared : 2645 State : OK Chunksize : 128 KB Daemon : 5s flush period Write Mode : Normal Sync Size : 52428736 (50.00 GiB 53.69 GB) Bitmap : 409600 bits (chunks), 1 dirty (0.0%) At the time the nbd0 member became degraded events_cleared was 2642. What I'm failing to understand is how sdq's events_cleared could be allowed to increment higher than 2642? I've not yet taken steps to understand/verify your test script. As such I'm not sure it models my test scenario yet. Mike -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html