On 11/20/2013 10:16 AM, James Plank wrote: > Hi all -- no real comments, except as I mentioned to Ric, my tutorial > in FAST last February presents Reed-Solomon coding with Cauchy > matrices, and then makes special note of the common pitfall of > assuming that you can append a Vandermonde matrix to an identity > matrix. Please see > http://web.eecs.utk.edu/~plank/plank/papers/2013-02-11-FAST-Tutorial.pdf, > slides 48-52. > > Andrea, does the matrix that you included in an earlier mail (the one > that has Linux RAID-6 in the first two rows) have a general form, or > did you develop it in an ad hoc manner so that it would include Linux > RAID-6 in the first two rows? Hello Jim, It's always perilous to follow a Ph.D., so I guess I'm feeling suicidal today. ;) I'm not attempting to marginalize Andrea's work here, but I can't help but ponder what the real value of triple parity RAID is, or quad, or beyond. Some time ago parity RAID's primary mission ceased to be surviving single drive failure, or a 2nd failure during rebuild, and became mitigating UREs during a drive rebuild. So we're now talking about dedicating 3 drives of capacity to avoiding disaster due to platter defects and secondary drive failure. For small arrays this is approaching half the array capacity. So here parity RAID has lost the battle with RAID10's capacity disadvantage, yet it still suffers the vastly inferior performance in normal read/write IO, not to mention rebuild times that are 3-10x longer. WRT rebuild times, once drives hit 20TB we're looking at 18 hours just to mirror a drive at full streaming bandwidth, assuming 300MB/s average--and that is probably being kind to the drive makers. With 6 or 8 of these drives, I'd guess a typical md/RAID6 rebuild will take at minimum 72 hours or more, probably over 100, and probably more yet for 3P. And with larger drive count arrays the rebuild times approach a week. Whose users can go a week with degraded performance? This is simply unreasonable, at best. I say it's completely unacceptable. With these gargantuan drives coming soon, the probability of multiple UREs during rebuild are pretty high. Continuing to use ever more complex parity RAID schemes simply increases rebuild time further. The longer the rebuild, the more likely a subsequent drive failure due to heat buildup, vibration, etc. Thus, in our maniacal efforts to mitigate one failure mode we're increasing the probability of another. TANSTAFL. Worse yet, RAID10 isn't going to survive because UREs on a single drive are increasingly likely with these larger drives, and one URE during rebuild destroys the array. I think people are going to have to come to grips with using more and more drives simply to brace the legs holding up their arrays; comes to grips with these insane rebuild times; or bite the bullet they so steadfastly avoided with RAID10. Lots more spindles solves problems, but at a greater cost--again, no free lunch. What I envision is an array type, something similar to RAID 51, i.e. striped parity over mirror pairs. In the case of Linux, this would need to be a new distinct md/RAID level, as both the RAID5 and RAID1 code would need enhancement before being meshed together into this new level[1]. Potential Advantages: 1. Only +1 disk capacity overhead vs RAID 10, regardless of drive count 2. Rebuild time is the same as RAID 10, unless a mirror pair is lost 3. Parity is only used during rebuild if/when a URE occurs, unless ^ 4. Single drive failure doesn't degrade the parity array, multiple failures in different mirrors doesn't degrade the parity array 5. Can sustain a minimum of 3 simultaneous drive failures--both drives in one mirror and one drive in another mirror 6. Can lose a maximum of 1/2 of the drives plus 1 drive--one more than RAID 10. Can lose half the drives and still not degrade parity, if no two comprise one mirror 7. Similar or possibly better read throughput vs triple parity RAID 8. Superior write performance with drives down 9. Vastly superior rebuild performance, as rebuilds will rarely, if ever, involve parity Potential Disadvantages: 1. +1 disk overhead vs RAID 10, many more than 2/3P w/large arrays 2. Read-modify-write penalty vs RAID 10 3. Slower write throughput vs triple parity RAID due to spindle deficit 4. Development effort 5. ?? [1] The RAID1/5 code would need to be patched to properly handle a URE encountered by the RAID1 code during rebuild. There are surely other modifications and/or optimizations that would be needed. For large sequential reads, more deterministic read interleaving between mirror pairs would be a good candidate I think. IIUC the RAID1 driver does read interleaving on a per thread basis or some such, which I don't believe is going to work for this "RAID 51" scenario, at least not for single streaming reads. If this can be done well, we double the read performance of RAID5, and thus we don't completely "waste" all the extra disks vs big_parity schemes. This proposed "RAID level 51" should have drastically lower rebuild times vs traditional striped parity, should not suffer read/write performance degradation with most disk failure scenarios, and with a read interleaving optimization may have significantly greater streaming read throughput as well. This is far from a perfect solution and I am certainly not promoting it as such. But I think it does have some serious advantages over traditional striped parity schemes, and at minimum is worth discussion as a counterpoint of sorts. -- Stan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html