On Fri, 6 Jun 2014 14:59:32 +0300 Alexander Lyakas <alex.bolshoy@xxxxxxxxx> wrote: > Hi Neil, > testing the following scenario: > > 1) create a raid1 with drives A and B, wait for resync to complete > (verify mismatch_cnt is 0) > 2) drive B fails, array continues to operate as degraded, new data is > written to array > 3) add a fresh drive C to array (after zeroing any possible superblock on C) > 4) wait for C recovery to complete > > At this point, for some reason "bitmap->events_cleared" is not > updated, it remains 0, although the bitmap is clear. We should update events_cleared after the first write after the array became optimal. I assume you didn't write to the array while the array was recovering or afterwards? > > 5) grow the array by one slot: > mdadm --grow /dev/md1 --raid-devices=3 --forc > 6) re-add drive B back > mdadm --manage /dev/md1 --re-add /dev/sdb > > MD accepts this drive, because in super_1_validate: > /* If adding to array with a bitmap, then we can accept an > * older device, but not too old. > */ > if (ev1 < mddev->bitmap->events_cleared) > return 0; > Since events_cleared==0, this condition DOES NOT hold, and drive B is accepted Yes, that is bad. I guess we need to update events_cleared when recovery completes because bits in the bitmap are cleared then too. Either bitmap_end_sync or the two places that call it need to update events_cleared just like bitmap_endwrite does. > > 7) recovery begins and completes immediately as the bitmap is clear > 8) issuing "echo check > ..." yields in a lot of mismatched > (naturally, as B's data was not synced) > > Is this a valid scenario? Any idea why events_cleared is not updated? Yes, scenario is valid. It is a bug and should be fixed. Would you like to write and test a patch as discussed above? Thanks, NeilBrown > Kernel is 3.8.13 > > Thanks, > Alex.
Attachment:
signature.asc
Description: PGP signature