On Tuesday May 6, snitzer@xxxxxxxxx wrote: > > It looks like bitmap_update_sb()'s incrementing of events_cleared (on > behalf of the local member) could be racing with the fact that the NBD > member becomes faulty (whereby making the array degraded). This > allows the events_cleared to reflect a clean->dirty transition last > occurred before the array became degraded. My reasoning is: If it was > a clean->dirty transition the bitmap still has the associated dirty > bit set in the local member's bitmap, so using the bitmap to resync is > valid. > > thanks, > Mike Thanks for persisting. I think I understand what is going on now. How about this patch? It is similar to your, but instead of depending on the odd/even state of the event counter, it directly checks the clean/dirty state of the array. NeilBrown Signed-off-by: Neil Brown <neilb@xxxxxxx> ### Diffstat output ./drivers/md/md.c | 5 +++++ 1 file changed, 5 insertions(+) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2008-05-02 14:49:05.000000000 +1000 +++ ./drivers/md/md.c 2008-05-08 16:10:48.000000000 +1000 @@ -843,6 +843,8 @@ static int super_90_validate(mddev_t *md /* if adding to array with a bitmap, then we can accept an * older device ... but not too old. */ + if (sb->state & (1<<MD_SB_CLEAN)) + ev1++; if (ev1 < mddev->bitmap->events_cleared) return 0; } else { @@ -1218,6 +1220,9 @@ static int super_1_validate(mddev_t *mdd /* If adding to array with a bitmap, then we can accept an * older device, but not too old. */ + if (mddev->recovery_cp == MaxSector) + /* array was clean, so can allow 'next' event */ + ev1++; if (ev1 < mddev->bitmap->events_cleared) return 0; } else { -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html