Re: misunderstanding of spare and raid devices? - and one question more

David Brown <david@xxxxxxxxxxxxxxx> · Fri, 01 Jul 2011 14:45:00 +0200

On 01/07/2011 13:29, Robin Hill wrote:
On Fri Jul 01, 2011 at 12:18:22PM +0200, David Brown wrote:

On 01/07/2011 10:50, Robin Hill wrote:
On Fri Jul 01, 2011 at 09:23:43 +0200, David Brown wrote:

What's the difference between a "resync" and a "recovery"?  Is it that a
"resync" will read the whole stripe, check if it is valid, and if it is
not it then generates the parity, while a "recovery" will always
generate the parity?

   From the names, recovery would mean that it's reading from N-1 disks,
and recreating data/parity to rebuild the final disk (as when it
recovers from a drive failure), whereas resync will be reading from all
N disks and checking/recreating the parity (as when you're running a
repair on the array).

The main reason I can see for doing a resync on RAID6 rather than a
recovery is if the data reconstruction from the Q parity is far slower
that the construction of the Q parity itself (I've no idea how the
mathematics works out for this).

Well, data reconstruction from Q parity /is/ more demanding than
constructing the Q parity in the first place (the mathematics is the
part that I know about).  That's why a two-disk degraded raid6 array is
significantly slower (or, more accurately, significantly more
cpu-intensive) than a one-disk degraded raid6 array.

But that doesn't make a difference here - you are rebuilding one or two
disks, so you have to use the data you've got whether you are doing a
resync or a recovery.

Yes, but in a resync all the data you have available is the data
blocks, and you're reconstructing all the P and Q parity blocks. With a
recovery, the data you have available is some of the data blocks and some
of the P&  Q parity blocks, so for some stripes you'll be reconstructing
the parity and for others you'll be regenerating the data using the
parity (and for some you'll be doing one of each).

If were that simple, then the resync (as used by RAID6 creates) would 
not be so much slower the recovery used in a RAID5 build...

With a resync, you first check if the parity blocks are correct (by 
generating them from the data blocks and comparing them to the read 
parity blocks).  If they are not correct, you write out the parity 
blocks.  With a recovery, you /know/ that one block is incorrect and 
re-generate that (from the data blocks if it is a parity block, or using 
the parities if it is a data block).

Consider the two cases raid5 and raid6 separately.

When you build your raid5 array, there is nothing worth keeping in the 
data - the aim is simply to make the stripes consistent.  There are two 
possible routes - consider the data blocks to be "correct" and do a 
resync to make sure the parity blocks match, or consider the first n-1 
disks to be "correct" and do a recovery to make sure the n'th disk 
matches.  For recovery, that means reading n-1 blocks in a stripe, doing 
a big xor, and writing out the remaining block (whether it is data or 
parity).  For rsync, it means reading all n blocks, and checking the 
xor.  If there is no match (which will be the norm when building an 
array), then the correct parity is calculated and written out.  Thus an 
rsync takes longer than a recovery, and a recovery is used.

When you build your raid6 array, you have the same two choices.  For an 
rsync, you have to read all n blocks, calculate P and Q, compare them, 
then (as there will be no match) write out P and Q.  In comparison to 
the raid5 recovery, you've done a couple of unnecessary block reads and 
compares, and the time-consuming Q calculation and write.  But if you 
chose recovery, then you'd be assuming the first n-2 blocks are correct 
and re-calculating the last two blocks.  This avoids the extra reads and 
compares, but if the two parity blocks are within the first n-2 blocks 
read, then the recovery calculations will be much slower.  Hence an 
rsync is faster for raid6.

I suppose the raid6 build could be optimised a little by skipping the 
extra reads when you know in advance that they will not match.  But 
either that is already being done, or it is considered a small issue 
that is not worth changing (since it only has an effect during the 
initial build).

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html