Re: sb->resync_offset value after resync failure

NeilBrown <neilb@xxxxxxx> · Thu, 26 Jan 2012 11:41:26 +1100

On Tue, 24 Jan 2012 16:25:04 +0200 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
wrote:

> Hello Neil,
> I hope you can find some time to look at my doubts in the email below.
> Meanwhile I realized I have more doubts about resync. Hopefully, you
> will be able to give some information on those too...
> 
> # I am looking at "--force" parameter for assembly, and also for
> "start_dirty_degraded" kernel parameters. They are actually very
> different:
> "force" marks the array as clean (sets sb->resync_offset=MaxSector).
> While if start_dirty_degraded==1, kernel actually starts resyncing the
> array. For RAID5, it starts and stops immediately (correct?) But for
> RAID6 coming up with one missing drive, kernel will do the resync
> using the remaining redundant drive.
> So start_dirty_degraded==1 is "better" then just forgetting about
> resync with "--force", isn't it? Because we will still have one parity
> block correct.

Correct.  Degraded RAID5  is either clean or faulty.  There is no sense in
which such an array can be simply "dirty" as there is no redundancy.

Singly-degraded RAID6 can be "dirty" as it still has one disk of redundancy 
which could be inconsistent with the rest.  So there is an extra state that
we need to handle which we currently do no.

> 
> Do you think the following logic is appropriate: always set
> start_dirty_degraded=1 kernel parameter. In mdadm during assembly
> detect dirty+degraded, and if "force" is not given - abort. If "force"
> is given, don't knock off sb->resync_offset (like code does today),
> assemble the array and let the kernel start resync (if there is still
> a redundant drive).

I would rather export mddev->ok_start_degraded via sysfs. Then mdadm can
respond to the --force flag on a dirty/degraded RAID6 by either setting the
flag if it exists, or marking the array 'clean' and starting a 'repair'.

> 
> # I saw an explanation on the list, that for RAID6 always a full
> stripe is rewritten. Given this, I think I don't understand why the
> initial resync of the array is needed. For those areas
> never written to, the parity may remain incorrect, because reading
> data from there is not expected to return anything meaningful. For
> those areas written, the parity will be
> recalculated while writing. So reading from those areas should have
> correct parity in degraded mode. I must be missing something here for
> sure, can you tell me what?

The initial sync isn't needed and you are free to specify --assume-clean if
you like.
However I don't guarantee that RAID6 will always perform reconstruct-write
(rather than read-modify-write).
Also, most people scrub their arrays periodically and if you haven't done an
initial sync, the first time you do a scrub you will likely get a high
mismatch count, which might be confusing.

So I resync by default because it is safest.  If you know what you are doing,
then explicitly disabling that is perfectly OK.

NeilBrown
Attachment:
signature.asc

Description: PGP signature