Re: Triple-parity raid6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/06/2011 14:20, Christoph Dittmann wrote:
On 06/10/2011 10:45 AM, David Brown wrote:
Making multiple parity syndromes is easy enough mathematically:

Adam Leventhal, who wrote the double parity and triple parity code for
ZFS, mentioned on his blog [1] that going beyond triple parity poses
significant challenges if write performance should not suffer.

In particular, it looks like a relevant math paper (originally) had a
flaw in the claim that quad parity and above can be implemented just
like triple parity.

I don't know the implementation you were going to use, neither am I
knowledgeable about multi-parity in general. I only thought it might be
relevant to add to the current discussion that other people had issues
with implementing N-parity for N > 3.



I've looked at Adam's blog article - in fact, stumbling over it when wandering around the 'net was one of my inspirations to think about multi-parity raid again.

But there are a few key differences between the description in the James Plank papers referenced, and the implementation I've looked at.

One point is that he is looking at GF(2^4), which is quickly a limiting factor. It's hard to get enough independent parity syndromes, and you can get easily start to think that things won't work for larger parity.

The other is the choice of factors for the syndromes. Assuming I understand the papers correctly, the first paper is suggesting this arrangement (all arithmetic done in GF()):

P_0 = 1^0 . d_0  +  2^0 . d_1  +  3^0 . d_2  +  4^0 . d_3
P_1 = 1^1 . d_0  +  2^1 . d_1  +  3^1 . d_2  +  4^1 . d_3
P_2 = 1^2 . d_0  +  2^2 . d_1  +  3^2 . d_2  +  4^2 . d_3
P_3 = 1^3 . d_0  +  2^3 . d_1  +  3^3 . d_2  +  4^3 . d_3

The second paper changes it to:

P_0 = 1^0 . d_0  +  1^1 . d_1  +  1^2 . d_2  +  1^3 . d_3
P_1 = 2^0 . d_0  +  2^1 . d_1  +  2^2 . d_2  +  2^3 . d_3
P_2 = 3^0 . d_0  +  3^1 . d_1  +  3^2 . d_2  +  3^3 . d_3
P_3 = 4^0 . d_0  +  4^1 . d_1  +  4^2 . d_2  +  4^3 . d_3


For the first two parity blocks, this is the same as for Linux raid6:

P = d_0 + d_1 + d_2 + d_3
Q = g^0 . d_0  +  g^1 . d_1  +  g^2 . d_2  +  g^3 . d_3
(where g = 2)

But the third (and later) lines are not good - the restoration equations on multiple disk failure are not independent for all combinations of failed disks. For example, if you have 20 data disks and 3 parity disks, there are 1140 different combinations of three data-disk failures. Of these, 92 combinations lead to non-independent equations when you try to restore the data.


The equations I am using are:

P_0 = 1^0 . d_0  +  1^1 . d_1  +  1^2 . d_2  +  1^3 . d_3
P_1 = 2^0 . d_0  +  2^1 . d_1  +  2^2 . d_2  +  2^3 . d_3
P_2 = 4^0 . d_0  +  4^1 . d_1  +  4^2 . d_2  +  4^3 . d_3
P_3 = 8^0 . d_0  +  8^1 . d_1  +  8^2 . d_2  +  8^3 . d_3

P_4 would use powers of 16, etc.

I have checked that the restoration equations are solvable for all combinations of 3 disk failures for triple parity, and all combinations of 4 disk failures for quad parity. The restoration equations can be written in matrix form and are dependent on the indexes of the failed disks. To solve them, you simply invert the matrix and this gives you a linear formula for re-calculating each missing data block. So to check that the data can be restored, you need to check that the determinant of this matrix is non-zero for all combinations of n-disk failures.

I /believe/ these should work find for at least up to P_7 (i.e., 8 parity disks), but I haven't yet checked more than a few larger parity modes. Checking all 4 disk failures from 253 disks took my inefficient python program about 45 minutes - to check for 8-disk failures would mean finding all combinations of 8 choices for up to 249 disks. If my maths serves me right, that's 316,528,258,780,856 combinations - and for each one, I'd have to find the determinant of an 8x8 matrix over GF(2^8). Fortunately, I don't think they'll be much call for octal-parity RAID in the near future :-)


Of course, all this assume that my maths is correct !


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux