Re: misunderstanding of spare and raid devices? - and one question more

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/07/2011 13:29, Robin Hill wrote:
On Fri Jul 01, 2011 at 12:18:22PM +0200, David Brown wrote:

On 01/07/2011 10:50, Robin Hill wrote:
On Fri Jul 01, 2011 at 09:23:43 +0200, David Brown wrote:

What's the difference between a "resync" and a "recovery"?  Is it that a
"resync" will read the whole stripe, check if it is valid, and if it is
not it then generates the parity, while a "recovery" will always
generate the parity?

   From the names, recovery would mean that it's reading from N-1 disks,
and recreating data/parity to rebuild the final disk (as when it
recovers from a drive failure), whereas resync will be reading from all
N disks and checking/recreating the parity (as when you're running a
repair on the array).

The main reason I can see for doing a resync on RAID6 rather than a
recovery is if the data reconstruction from the Q parity is far slower
that the construction of the Q parity itself (I've no idea how the
mathematics works out for this).


Well, data reconstruction from Q parity /is/ more demanding than
constructing the Q parity in the first place (the mathematics is the
part that I know about).  That's why a two-disk degraded raid6 array is
significantly slower (or, more accurately, significantly more
cpu-intensive) than a one-disk degraded raid6 array.

But that doesn't make a difference here - you are rebuilding one or two
disks, so you have to use the data you've got whether you are doing a
resync or a recovery.

Yes, but in a resync all the data you have available is the data
blocks, and you're reconstructing all the P and Q parity blocks. With a
recovery, the data you have available is some of the data blocks and some
of the P&  Q parity blocks, so for some stripes you'll be reconstructing
the parity and for others you'll be regenerating the data using the
parity (and for some you'll be doing one of each).


If were that simple, then the resync (as used by RAID6 creates) would not be so much slower the recovery used in a RAID5 build...

With a resync, you first check if the parity blocks are correct (by generating them from the data blocks and comparing them to the read parity blocks). If they are not correct, you write out the parity blocks. With a recovery, you /know/ that one block is incorrect and re-generate that (from the data blocks if it is a parity block, or using the parities if it is a data block).

Consider the two cases raid5 and raid6 separately.

When you build your raid5 array, there is nothing worth keeping in the data - the aim is simply to make the stripes consistent. There are two possible routes - consider the data blocks to be "correct" and do a resync to make sure the parity blocks match, or consider the first n-1 disks to be "correct" and do a recovery to make sure the n'th disk matches. For recovery, that means reading n-1 blocks in a stripe, doing a big xor, and writing out the remaining block (whether it is data or parity). For rsync, it means reading all n blocks, and checking the xor. If there is no match (which will be the norm when building an array), then the correct parity is calculated and written out. Thus an rsync takes longer than a recovery, and a recovery is used.

When you build your raid6 array, you have the same two choices. For an rsync, you have to read all n blocks, calculate P and Q, compare them, then (as there will be no match) write out P and Q. In comparison to the raid5 recovery, you've done a couple of unnecessary block reads and compares, and the time-consuming Q calculation and write. But if you chose recovery, then you'd be assuming the first n-2 blocks are correct and re-calculating the last two blocks. This avoids the extra reads and compares, but if the two parity blocks are within the first n-2 blocks read, then the recovery calculations will be much slower. Hence an rsync is faster for raid6.

I suppose the raid6 build could be optimised a little by skipping the extra reads when you know in advance that they will not match. But either that is already being done, or it is considered a small issue that is not worth changing (since it only has an effect during the initial build).



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux