Re: Clarification behind md 1.0 superblock resync_offset and recovery_offset?

NeilBrown <neilb@xxxxxxx> · Fri, 7 Oct 2011 09:17:30 +1100

On Thu, 6 Oct 2011 14:48:28 -0400 "Andrei E. Warkentin"
<andrey.warkentin@xxxxxxxxx> wrote:

> Hi group,
> 
> This is my first time posting on this mailing list. I've tried looking
> in the archives, but didn't find what I was looking for. I am trying
> to understand how the synchronization code in MD driver works, and I
> am unsure about the exact relation between resync_offset
> and recovery_offset for the 1.X SB format.
> 
> sb->resync_offset sets mddev->recovery_cp, which is the last sector
> synchronized/recovered when md_do_sync exits, and isn't
> used for recovery if a bitmap is used.
> 
> sb->recovery_offset sets rdev->recovery_offset, which seemingly is a
> per-member-device recovery_cp, but updated at a finer granularity,
> right?
> 
> I think my confusion might be stemming from a misunderstandig behind
> what RECOVERY and SYNC implies. I thought that RECOVERY
> means metadata cleanup, while RESYNC is actual syncing of data to
> spares (or re-added previously faulty disks), but then why is there a
> recovery offset? Why isn't a resync offset sufficient?

Yes - that would be the source of your confusion.  You have it exactly
backwards.
"SYNC" means making sure all the devices in the array synchronised - i.e.
ensuring they all reflect the same set of data.  This is only needed after
and unclean shutdown.  For RAID5, this means checking and  if necessary
correcting all the parity blocks.  A RAID5 or RAID6 that is not synchronised
by is degraded could have data corruption.

"RECOVERY" happens when you lose a device and need to replace it with a
spare.  The data that was (or should have been) on the missing device is
recovered from other sources (e.g. from Parity and Data calculations) and is
written to the spare.

There isn't any point checkpointing a SYNC at regular intervals.  If the
shutdown is clean, the SYNC offset can be recorded at that time so that on
restart the sync can continue at the same offset.  If the shutdown is
unclean, you need to start again at the beginning anyway.
Well.... to be fair there might be value in checkpointing a SYNC while the
array is not being written to (and so is marked clean), but that is a fairly
uninteresting corner case.

There is value in checkpointing recovery, but it only works for 1.x metadata.
The 0.90 metadata didn't not allow a device to be a full member of the array
until it was completely recovered.  A partly-recovered device is still a
spare as far as the metadata is concerned.
1.x metadata converts the spare into an array member immediately but records
that only part of it (at first only 0%) is actually recovered.  As recovery
progresses we periodically update the recovery offset (every 6.25% I think).
Now if there is an unclean shutdown the part of that was recovered is certain
to be safe - we just resync that - then continue recovery of the rest:which
could have a small corruption, but at least we have improved the situation.

> 
> I'd be willing to submit a few documentation patches just to others
> have an easier time of reading MD code :-).

That would be very much appreciated.

> 
> Thank you,

NeilBrown
Attachment:
signature.asc

Description: PGP signature