Re: 1.X metadata: Resuming an interrupted incremental recovery for RAID1.

NeilBrown <neilb@xxxxxxx> · Wed, 12 Oct 2011 14:05:17 +1100

On Tue, 11 Oct 2011 20:45:25 -0400 "Andrei E. Warkentin"
<andrey.warkentin@xxxxxxxxx> wrote:

> Hi group, Neil,
> 
> I've seen the following behavior -
> 1) Create a RAID1 array with two devices with an internal bitmap.
> 2) Degrade the array.
> 3) Write data in the array.
> 4) Re-add the removed member - this start an incremental recovery.
> 5) Interrrupt the recovery (cause I/O failure in the just re-added
> disk) - array degraded again.
> 6) Re-add the removed member - this starts a full recovery.

Yeh, it probably shouldn't do that.

> 
> If I understand, the choice behind incremental/full is based on the
> In_Sync bit, which for the two
> possibilities of an interrupted recovery, namely, an "active but
> recovering" disk (with a role) and "a spare prior
> to role assignment" (i.e. before remove_and_add_spares is run, I
> think), the In_Sync bit is never set.

Something like that ... though of course there are lots more horrible details.

> 
> It seems like it should be safe enough to resume an incremental
> recovery from where it left off, after all,
> the intent bitmap will still reflect the unsynchornized data, right?

Yes, it should.

> 
> How about something like the following?
> 
> 1) Add another SB feature - MD_FEATURE_IN_RECOVERY.
> 2) MD_FEATURE_IN_RECOVERY is set in in super_1_sync if
> rdev->saved_raid_disk != -1 and mddev->bitmap.
> 3) MD_FEATURE_IN_RECOVERY is unset in super_1_sync otherwise.
> 4) If MD_FEATURE_IN_RECOVERY is for the 'default' case in
> super_1_validate, set the In_Sync bit, causing an
>     incremental recovery to happen.

We probably do need another flag somewhere, as we have two different states
over-lapping.
The first time you re-added a device, it's recovery_offset was MAX and it's
event_count was old, but not older than the bitmap.
So we could do a bitmap-based recovery.

When it was re-added we would have updated the metadata to have a newer event
count, but a lower recovery_offset.  So it is no longer clear from the
metadata that a bitmap-based recovery is allowed.
We could leave the old metadata, but then the bitmap-based resync would
restart from the beginning.
You want the bitmap-based-resync to restart from recovery_offset which is a
new state.  So maybe we do need a new flag.

> 
> The above handles 99% (as far as I tested).

That's not bad.

> 
> The only case left is dealing with the 'spare' transition - in which
> case I also need to remember rdev->saved_raid_disk someplace
> in the superblock (and restore raid_disk and the In_Sync bit in
> super_1_validate as well).  If I understand correctly,
> sb->resync_offset is a
> safe place, since it's disregarded for a bitmapped rdev.

You've lost me ... and I was coping so well too!

What "spare transition".  I cannot see a point in the process where the
metadata would get marked as being a spare...

> 
> What do you think? Am I missing something or is there a better way of
> achieving what I am trying to do? I am basically
> trying to ensure that if an rdev went away during incremental
> recovery, then incremental recovery will resume if it is re-added.
> This
> will not affect adding a 'clean' spare (it will cause a full recovery).
> 

<thinks....>

I think that if a spare being bitmap-recovered fails and then gets re-added,
then we have to start the bitmap-recovery from the beginning.  i.e. we cannot
use the recovery_offset.  This is because it is entirely possible that while
the device was missing the second time, a write went to an address before
recovery_offset and so there is a new bit, which the recovery has to handle.

So the correct thing to do is to *not* update the metadata on the recovering
device until recovery completes.  Then if it fails and is re-added, it will
look just the same as when it was re-added the first time, and will do a
bitmap-based recovery.
Only when the bitmap-based recovery finishes should the metadata be updated
with a new event count, and then the bitmap can start forgetting those old
bits.

Credible?
</thinks>

NeilBrown

Attachment:
signature.asc

Description: PGP signature