Re: Triple parity and beyond

joystick <joystick@xxxxxxxxxxxxx> · Thu, 21 Nov 2013 09:08:37 +0100

On 21/11/2013 02:28, Stan Hoeppner wrote:
On 11/20/2013 10:16 AM, James Plank wrote:
Hi all -- no real comments, except as I mentioned to Ric, my tutorial
in FAST last February presents Reed-Solomon coding with Cauchy
matrices, and then makes special note of the common pitfall of
assuming that you can append a Vandermonde matrix to an identity
matrix.  Please see
http://web.eecs.utk.edu/~plank/plank/papers/2013-02-11-FAST-Tutorial.pdf,
slides 48-52.

Andrea, does the matrix that you included in an earlier mail (the one
that has Linux RAID-6 in the first two rows) have a general form, or
did you develop it in an ad hoc manner so that it would include Linux
RAID-6 in the first two rows?
Hello Jim,

It's always perilous to follow a Ph.D., so I guess I'm feeling suicidal
today. ;)

I'm not attempting to marginalize Andrea's work here, but I can't help
but ponder what the real value of triple parity RAID is, or quad, or
beyond.  Some time ago parity RAID's primary mission ceased to be
surviving single drive failure, or a 2nd failure during rebuild, and
became mitigating UREs during a drive rebuild.  So we're now talking
about dedicating 3 drives of capacity to avoiding disaster due to
platter defects and secondary drive failure.  For small arrays this is
approaching half the array capacity.  So here parity RAID has lost the
battle with RAID10's capacity disadvantage, yet it still suffers the
vastly inferior performance in normal read/write IO, not to mention
rebuild times that are 3-10x longer.

WRT rebuild times, once drives hit 20TB we're looking at 18 hours just
to mirror a drive at full streaming bandwidth, assuming 300MB/s
average--and that is probably being kind to the drive makers.  With 6 or
8 of these drives, I'd guess a typical md/RAID6 rebuild will take at
minimum 72 hours or more, probably over 100, and probably more yet for
3P.  And with larger drive count arrays the rebuild times approach a
week.  Whose users can go a week with degraded performance?  This is
simply unreasonable, at best.  I say it's completely unacceptable.

With these gargantuan drives coming soon, the probability of multiple
UREs during rebuild are pretty high.

No because if you are correct about the very high CPU overhead during 
rebuild (which I don't see so dramatic as Andrea claims 500MB/sec for 
triple-parity, probably parallelizable on multiple cores), the speed of 
rebuild decreases proportionally and hence the stress and heating on the 
drives proportionally reduces, approximating that of normal operation.
And how often have you seen a drive failure in a week during normal 
operation?

But in reality, consider that a non-naive implementation of 
multiple-parity would probably use just the single parity during 
reconstruction if just one disk fails, using the multiple parities only 
to read the stripes which are unreadable at single parity. So the speed 
and time of reconstruction and performance penalty would be that of 
raid5 except in exceptional situations of multiple failures.

...
What I envision is an array type, something similar to RAID 51, i.e.
striped parity over mirror pairs. ....

I don't like your approach of raid 51: it has the write overhead of 
raid5, with the waste of space of raid1.
So it cannot be used as neither a performance array nor a capacity array.
In the scope of this discussion (we are talking about very large 
arrays), the waste of space of your solution, higher than 50%, will make 
your solution costing double the price.

A competitor for the multiple-parity scheme might be raid65 or 66, but 
this is a so much dirtier approach than multiple parity if you think at 
the kind of rmw and overhead that will occur during normal operation.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html