Re: md-raid paranoia mode?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/06/14 14:48, Bart Kus wrote:
Hello,

As far as I understand, md-raid relies on the underlying devices to
inform it of IO errors before it'll seek redundant/parity data to
fulfill the read request.  I have, however, seen certain hard drives
report successful reads while returning garbage data.

If you have drives that return garbage as valid data then you have far greater problems than what you are suggesting will fix. So much so I suggest you document these instances and start banging a drum announcing them in a name and shame campaign. That sort of behavior from storage devices is never ok, and the manufacturer needs to know that.

This comes up on the list at least once a year, and the upshot is that your storage platform needs to be reliable. Storage is *supposed* to be reliable. Even the cheapest solution is *supposed* to say "I'm sorry but that bit of data you asked for is toast". Even my 35c USB drives do that.

Whether you have a single drive or 10 mirrors, if you have a drive returning garbage you need to solve that problem first. Patching software that is based on the fundamental assumption that the storage stack knows when something is bad, to no longer trust that assumption makes all sorts of guarantees go out the window.

From personal experience, I lost a 12TB RAID-6 and all the data on it due to a bad SATA controller. The controller would return corrupt reads under heavy load, and months of read/modify/write cycles combined with corrupt data spread the corruption all over the array. My immediate reaction was the same as yours. "RAID6 should be able to protect against this stuff", but after education from people that are more knowledgeable than I, it became apparent that bad hardware is JUST insidious and papering over one part of the stack would just lead to it biting me elsewhere anyway.

I learned 2 very valuable lessons.
- Don't deploy hardware unless you trust it. This may mean a month of burn-in testing in a spare machine, or delaying trusting it with valuable data. In my case it was a cheap 2 port PCIe SATA card procured to get me out of a tight spot, so I plugged it in and strapped drives to it blindly believing it would be ok.
- RAID is no substitute for backups.


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux