On Thu, 6 Oct 2011 14:48:28 -0400 "Andrei E. Warkentin" <andrey.warkentin@xxxxxxxxx> wrote: > Hi group, > > This is my first time posting on this mailing list. I've tried looking > in the archives, but didn't find what I was looking for. I am trying > to understand how the synchronization code in MD driver works, and I > am unsure about the exact relation between resync_offset > and recovery_offset for the 1.X SB format. > > sb->resync_offset sets mddev->recovery_cp, which is the last sector > synchronized/recovered when md_do_sync exits, and isn't > used for recovery if a bitmap is used. > > sb->recovery_offset sets rdev->recovery_offset, which seemingly is a > per-member-device recovery_cp, but updated at a finer granularity, > right? > > I think my confusion might be stemming from a misunderstandig behind > what RECOVERY and SYNC implies. I thought that RECOVERY > means metadata cleanup, while RESYNC is actual syncing of data to > spares (or re-added previously faulty disks), but then why is there a > recovery offset? Why isn't a resync offset sufficient? Yes - that would be the source of your confusion. You have it exactly backwards. "SYNC" means making sure all the devices in the array synchronised - i.e. ensuring they all reflect the same set of data. This is only needed after and unclean shutdown. For RAID5, this means checking and if necessary correcting all the parity blocks. A RAID5 or RAID6 that is not synchronised by is degraded could have data corruption. "RECOVERY" happens when you lose a device and need to replace it with a spare. The data that was (or should have been) on the missing device is recovered from other sources (e.g. from Parity and Data calculations) and is written to the spare. There isn't any point checkpointing a SYNC at regular intervals. If the shutdown is clean, the SYNC offset can be recorded at that time so that on restart the sync can continue at the same offset. If the shutdown is unclean, you need to start again at the beginning anyway. Well.... to be fair there might be value in checkpointing a SYNC while the array is not being written to (and so is marked clean), but that is a fairly uninteresting corner case. There is value in checkpointing recovery, but it only works for 1.x metadata. The 0.90 metadata didn't not allow a device to be a full member of the array until it was completely recovered. A partly-recovered device is still a spare as far as the metadata is concerned. 1.x metadata converts the spare into an array member immediately but records that only part of it (at first only 0%) is actually recovered. As recovery progresses we periodically update the recovery offset (every 6.25% I think). Now if there is an unclean shutdown the part of that was recovered is certain to be safe - we just resync that - then continue recovery of the rest:which could have a small corruption, but at least we have improved the situation. > > I'd be willing to submit a few documentation patches just to others > have an easier time of reading MD code :-). That would be very much appreciated. > > Thank you, NeilBrown
Attachment:
signature.asc
Description: PGP signature