Re: "bitmap file is out of date, doing full recovery"

NeilBrown <neilb@xxxxxxx> · Wed, 29 Oct 2014 10:19:50 +1100

On Thu, 23 Oct 2014 19:04:48 +0300 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
wrote:

> Hi Neil,
> I found at least one way of this happening. The problem is that in
> md_update_sb() we allow to decrease the event count:
> 
>     /* If this is just a dirty<->clean transition, and the array is clean
>      * and 'events' is odd, we can roll back to the previous clean state */
>     if (nospares
>         && (mddev->in_sync && mddev->recovery_cp == MaxSector)
>         && mddev->can_decrease_events
>         && mddev->events != 1) {
>         mddev->events--;
>         mddev->can_decrease_events = 0;
> 
> Then we call bitmap_update_sb(). If we crash after we update (the
> first or all of) bitmap superblocks, then after reboot, we will see
> that bitmap event count is less than MD superblock event count. Then
> we decide to do full resync.
> 
> This can be easily reproduced by hacking bitmap_update_sb() to call
> BUG(), after it calls write_page() in case event count was decreased.
> 
> Why we are decreasing the event count??? Can we always increase it?
> u64 is a lot to increase...

The reason for decreasing the event count is so that we don't need to update
the event count on spares - they can be left spun down.
We for simple clean/dirty transitions with increment for clean->dirty and
decrement for dirty->clean.  But we should only use this optimisation when
everything is simple.
We really shouldn't do this when the array is degraded.
Do this fix your problem?

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2c73fcb82593..98fd97b10e13 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2244,6 +2244,7 @@ repeat:
 	 * and 'events' is odd, we can roll back to the previous clean state */
 	if (nospares
 	    && (mddev->in_sync && mddev->recovery_cp == MaxSector)
+	    && mddev->degraded == 0
 	    && mddev->can_decrease_events
 	    && mddev->events != 1) {
 		mddev->events--;


> 
> Some other doubt that I have is that bitmap_unplug() and
> bitmap_daemon_work() call write_page() on page index=0. This page
> contains both the superblock and also some dirty bits (could not we
> waste 4KB on bitmap superblock???). I am not sure, but I wonder
> whether this call can race with md_update_sb (which explicitly calls
> bitmap_update_sb), and somehow write the outdated superblock, after
> bitmap_update_sb has completed writing it.
> 

storage.sb_page is exactly the same as storage.filemap[0]
So once an update has happened, the "outdated superblock" doesn't exist
anywhere to be written out from.

> Yet another suspect is when loading the bitmap we basically load it
> from the first up-to-date drive. Maybe we should have scanned all the
> bitmap superblocks, and selected one that has the higher event count
> (although as we saw "higher" does not necessarily mean "more
> up-to-date").
> 
> Anyways, back to decrementing the event count. Do you see any issue
> with not doing this and always incrementing?
> 
> Thanks,
> Alex.
> 

Thanks,
NeilBrown
Attachment:
pgpZ2xmiLKKYz.pgp

Description: OpenPGP digital signature