Re: "bitmap file is out of date, doing full recovery"

NeilBrown <neilb@xxxxxxx> · Wed, 17 Dec 2014 09:26:06 +1100

On Sun, 14 Dec 2014 14:11:05 +0200 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
wrote:

> Hi Neil,
> This issue keeps happening to us. Do you see any problem in always
> incrementing the event count?

The reason we don't always increment the event count is that it wakes up
spare device unnecessarily.

Maybe the event counts on spare devices should be ignored....

NeilBrown

> 
> Thanks,
> Alex.
> 
> On Tue, Nov 4, 2014 at 11:17 AM, Alexander Lyakas
> <alex.bolshoy@xxxxxxxxx> wrote:
> > Hi Neil,
> > thank you for your comments.
> >
> > On Wed, Oct 29, 2014 at 1:19 AM, NeilBrown <neilb@xxxxxxx> wrote:
> >> On Thu, 23 Oct 2014 19:04:48 +0300 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
> >> wrote:
> >>
> >>> Hi Neil,
> >>> I found at least one way of this happening. The problem is that in
> >>> md_update_sb() we allow to decrease the event count:
> >>>
> >>>     /* If this is just a dirty<->clean transition, and the array is clean
> >>>      * and 'events' is odd, we can roll back to the previous clean state */
> >>>     if (nospares
> >>>         && (mddev->in_sync && mddev->recovery_cp == MaxSector)
> >>>         && mddev->can_decrease_events
> >>>         && mddev->events != 1) {
> >>>         mddev->events--;
> >>>         mddev->can_decrease_events = 0;
> >>>
> >>> Then we call bitmap_update_sb(). If we crash after we update (the
> >>> first or all of) bitmap superblocks, then after reboot, we will see
> >>> that bitmap event count is less than MD superblock event count. Then
> >>> we decide to do full resync.
> >>>
> >>> This can be easily reproduced by hacking bitmap_update_sb() to call
> >>> BUG(), after it calls write_page() in case event count was decreased.
> >>>
> >>> Why we are decreasing the event count??? Can we always increase it?
> >>> u64 is a lot to increase...
> >>
> >> The reason for decreasing the event count is so that we don't need to update
> >> the event count on spares - they can be left spun down.
> >> We for simple clean/dirty transitions with increment for clean->dirty and
> >> decrement for dirty->clean.  But we should only use this optimisation when
> >> everything is simple.
> >> We really shouldn't do this when the array is degraded.
> >> Do this fix your problem?
> >>
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index 2c73fcb82593..98fd97b10e13 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -2244,6 +2244,7 @@ repeat:
> >>          * and 'events' is odd, we can roll back to the previous clean state */
> >>         if (nospares
> >>             && (mddev->in_sync && mddev->recovery_cp == MaxSector)
> >> +           && mddev->degraded == 0
> >>             && mddev->can_decrease_events
> >>             && mddev->events != 1) {
> >>                 mddev->events--;
> >>
> >>
> > No, unfortunately, this doesn't fix the problem. In my case, the array
> > is never degraded. Both drives are present and operational, then the
> > box crashes, and after reboot the bitmap event counter is lower than
> > we expect. Again, this is easily reproduced by hacking
> > bitmap_update_sb() as I mentioned earlier.
> >
> > In my case array does not have spares. (There is some other system on
> > top, which monitors the array, and, if needed, adds a spare from a
> > "global" spare pool). Is this ok in this case to always increment the
> > event count?
> >
> > Thanks,
> > Alex.
> >
> >
> >>>
> >>> Some other doubt that I have is that bitmap_unplug() and
> >>> bitmap_daemon_work() call write_page() on page index=0. This page
> >>> contains both the superblock and also some dirty bits (could not we
> >>> waste 4KB on bitmap superblock???). I am not sure, but I wonder
> >>> whether this call can race with md_update_sb (which explicitly calls
> >>> bitmap_update_sb), and somehow write the outdated superblock, after
> >>> bitmap_update_sb has completed writing it.
> >>>
> >>
> >> storage.sb_page is exactly the same as storage.filemap[0]
> >> So once an update has happened, the "outdated superblock" doesn't exist
> >> anywhere to be written out from.
> >>
> >>> Yet another suspect is when loading the bitmap we basically load it
> >>> from the first up-to-date drive. Maybe we should have scanned all the
> >>> bitmap superblocks, and selected one that has the higher event count
> >>> (although as we saw "higher" does not necessarily mean "more
> >>> up-to-date").
> >>>
> >>> Anyways, back to decrementing the event count. Do you see any issue
> >>> with not doing this and always incrementing?
> >>>
> >>> Thanks,
> >>> Alex.
> >>>
> >>
> >> Thanks,
> >> NeilBrown

Attachment:
pgpDN7DpqKM6g.pgp

Description: OpenPGP digital signature