Re: detection/correction of corruption with raid6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2008/12/5 Michał Przyłuski <mikylie@xxxxxxxxx>:> 2008/12/5 Peter Rabbitson <rabbit+list@xxxxxxxxx>:>> Michał Przyłuski wrote:>>> Hi,>>>>>> 2008/12/5 Redeeman <redeeman@xxxxxxxxxxx>:>>>> On Fri, 2008-12-05 at 16:09 -0500, Justin Piszcz wrote:>>>>> On Fri, 5 Dec 2008, Redeeman wrote:>>>>>>>>>>> On Fri, 2008-12-05 at 16:02 -0500, Justin Piszcz wrote:>>>>>>> On Fri, 5 Dec 2008, Redeeman wrote:>>>>>>>>>>>>>>> Hello.>>>>>>>>>>>>>>>> I was looking at the PDFs linked to from the wiki, and found this:>>>>>>>> http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf>>>>>>>>>>>>>>> More specifically, section 4, starting on page 8.>>>>>>>>>>>>>>>> Am I understanding this correctly, in that with raid6, linux is capable>>>>>>>> of detecting if the content on 1 disk is corrupted, and reconstruct it>>>>>>>> from the remaining disks?>>>>>>> I ran md/raid6 for awhile, do you mean remap the bad sector on the fly?>>>>>>> Linux/md raid does not do this afaik.>>>>>> No, i mean, if one disk does silent corruption>>>>> What would the error look like?  Both md/Linux & in the 3ware manual>>>>> recommend you run a 'check' across the raid at least once a week>>>>> (3ware/raid-verify) and md/Linux in Debian runs a check once a month I>>>>> believe to eliminate these issues.>>>>>>>>>> If you are asking whether a read error of a latent sector from the one>>>>> disk will result it reading the data from the second disk that is a good>>>>> question.>>>> im asking, if one disk in a raid6 setup suddenly decides to flip a few>>>> bits in some bytes, will it be able to detect that in a scan, and>>>> correct it? i cant see how it can do it on raid5, but maybe raid6?>>>>>> No, not really.>>> I've been investigating silent corruption for a quite a while now, and>>> it looks more or less like this.>>> During a "check" action it'll be detected. During normal operation ->>> it won't be detected.>>> Normal (non-degraded) raid5/6 reads don't read parity (or Q syndrome),>>> they just read data. So they have no idea that something went bad.>>> Now, worse news is that you cannot really fix it automagically, even>>> after detecting by a "check" procedure. A "repair" will overwrite>>> parity and Q syndrome, with new values (new = calculated from what it>>> seems to be data blocks).>>>>>> It is possible (by the theory of Q syndrome, per the article you>>> linked) to detect which drive is doing a silent corruption with raid6>>> (and with some extra assumption, that just one drive is doing that).>>> But it's not implemented.>>>>>>> I'd like to shamelessly bring in an older related thread:>> http://marc.info/?l=linux-raid&m=120605458309825>> http://marc.info/?l=linux-raid&m=120618020817057>>>> Maybe someone will get inspired, and will actually write the damned thing :)>> I concur. Even without a "fix", just printing information which disk> is suspected of doing silent corruption will be helpful. One can at> least, fail the disk, and get rid of it. Still better than taking wild> guesses what went wrong. I'm a silent corruption maniac myself,> keeping md5's of most bigger/more important files, so my judgment> might not be fair.>> Also, it seems the feature is being asked about about 3-4 times a> year, which is probably the second most requested feature after> numerous reshape variations.> Regards,> Mike>
I'm also very concerned about silent corruption and we often "verify"our critical large files by  performing MD5 verifies against a knowngood value.  Especially when we make copies or move them from onemedia to another.
But in all the cases of silent corruption I've seen, it was never thedisk.  Instead I've seen it be the cable, the controller, bad memory,bad power supply, but never the disk itself.  Not to say the diskcontroller could not be the cause, just that I have not seen it.
I did not read the relevant threads, but do they cover all of thesesources of silent corruption, or just if a disk is the source?
ThanksGreg-- Greg FreemyerLitigation Triage Solutions Specialisthttp://www.linkedin.com/in/gregfreemyerFirst 99 Days Litigation White Paper -http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf
The Norcross GroupThe Intersection of Evidence & Technologyhttp://www.norcrossgroup.com˙ôčş{.nÇ+?ˇ?Ž?­?+%?Ë˙ąéÝś;Ľ?w˙ş{.nÇ+?ˇĽ?{ąţś˘wř§ś?ĄÜ¨}Š?˛Ć zÚ&j:+v?¨ţřŻůŽwĽţ?ŕ2?Ţ?¨č­Ú&˘)ߥŤaśÚ˙˙űŕzżäzšŢ?ú+?ů???ݢj˙?wčţf


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux