On Thu, 23 Oct 2014 19:04:48 +0300 Alexander Lyakas <alex.bolshoy@xxxxxxxxx> wrote: > Hi Neil, > I found at least one way of this happening. The problem is that in > md_update_sb() we allow to decrease the event count: > > /* If this is just a dirty<->clean transition, and the array is clean > * and 'events' is odd, we can roll back to the previous clean state */ > if (nospares > && (mddev->in_sync && mddev->recovery_cp == MaxSector) > && mddev->can_decrease_events > && mddev->events != 1) { > mddev->events--; > mddev->can_decrease_events = 0; > > Then we call bitmap_update_sb(). If we crash after we update (the > first or all of) bitmap superblocks, then after reboot, we will see > that bitmap event count is less than MD superblock event count. Then > we decide to do full resync. > > This can be easily reproduced by hacking bitmap_update_sb() to call > BUG(), after it calls write_page() in case event count was decreased. > > Why we are decreasing the event count??? Can we always increase it? > u64 is a lot to increase... The reason for decreasing the event count is so that we don't need to update the event count on spares - they can be left spun down. We for simple clean/dirty transitions with increment for clean->dirty and decrement for dirty->clean. But we should only use this optimisation when everything is simple. We really shouldn't do this when the array is degraded. Do this fix your problem? diff --git a/drivers/md/md.c b/drivers/md/md.c index 2c73fcb82593..98fd97b10e13 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -2244,6 +2244,7 @@ repeat: * and 'events' is odd, we can roll back to the previous clean state */ if (nospares && (mddev->in_sync && mddev->recovery_cp == MaxSector) + && mddev->degraded == 0 && mddev->can_decrease_events && mddev->events != 1) { mddev->events--; > > Some other doubt that I have is that bitmap_unplug() and > bitmap_daemon_work() call write_page() on page index=0. This page > contains both the superblock and also some dirty bits (could not we > waste 4KB on bitmap superblock???). I am not sure, but I wonder > whether this call can race with md_update_sb (which explicitly calls > bitmap_update_sb), and somehow write the outdated superblock, after > bitmap_update_sb has completed writing it. > storage.sb_page is exactly the same as storage.filemap[0] So once an update has happened, the "outdated superblock" doesn't exist anywhere to be written out from. > Yet another suspect is when loading the bitmap we basically load it > from the first up-to-date drive. Maybe we should have scanned all the > bitmap superblocks, and selected one that has the higher event count > (although as we saw "higher" does not necessarily mean "more > up-to-date"). > > Anyways, back to decrementing the event count. Do you see any issue > with not doing this and always incrementing? > > Thanks, > Alex. > Thanks, NeilBrown
Attachment:
pgpZ2xmiLKKYz.pgp
Description: OpenPGP digital signature