Re: Suboptimal raid6 linear read speed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 20 Jan 2013, Peter Grandi wrote:

complete recovery; some/most drives IIRC can return the sector
content that has been reconstructed, which often is wrong in
only a few bits.

This is not my experience. When I get UREs on a 4k drive, I get read error on 8 consecutive 512 byte blocks. This is the first time I've ever heard about someone claiming drives will give back information that is just a little bit wrong. Perhaps there should be such a command to tell the drive to "give me what you've got", but most of the time, this is undesireable.

Going back to the BER discussion.

I'm a network engineer. We count BER as "flipped bits rate", which is detected using CRC (ethernet does this). "under" this one can do g.709 FEC (forward error correction), which can detect flipped bits and still return a correct result to the CRC checksummer because there is enough other information (~10 percent overhead) to reconstruct the original information, thus passing the CRC check.

Typical rated ethernet BER is 10^-12. This means in 10^-12 bits sent on the wire, if a bit is flipped and a whole packet is thus list, it's still within specifications. Normally when one does things right, the BER is way better than 10^-12, the norm is to have a 10GE link running for months without a single user-detectable bit error.

In the articles I have read about how harddrives work, they all state that hdd manufacturers do very similar things. They store bits on the media with extra information so the drive can do FEC, they have a checksum, if the checksum doesn't match then the block is re-read, and if after a while no correct checksum block can be served, an URE is reported and the OS reports the read as failed. The advantage with 4k block drives is that FEC is more effective on larger blocks because errors usually turn up in bursts (one gets several flipped bits in a row), so having larger blocks means more flipped bits in a row can get corrected. ADSL2+ works in a similar way when one turns on 16ms interleaving, it smears out the bits over a longer time, so a 0.1ms disturbance (complete, no bits are correct) can be corrected using FEC.

Also, I do agree with you that RAID6 puts mechanical stress on the drives but my main failure scenario (own experience) is still single drive failure and then scattered UREs when reading from the other drives, which can be corrected by RAID6 parity during the resync. RAID6 is economical when using it with 10-12 drives, and fits my storage needs (as long as I get ~30 megabyte/s or better large file read/write performance from the array, I'm fine). Other workloads, as you say, might have other requirements.

A lot of people I see coming in on the IRC channel use RAID5 for their storage needs and they come in with UREs when reconstructing a failed drive. I'd say taking the array offline and using dd_rescue is something we have to recommend several times a month. This is why I keep recommending RAID6 over RAID5.

More reading:

http://www.high-rely.com/hr_66/blog/why-raid-5-stops-working-in-2009-not/

www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162

I still don't understand how these two articles seem to come to wildly different conclusions and the first one still claims the last one is correct :P Well, the zdnet one matches my own experiences with mine and other people I talk to.

--
Mikael Abrahamsson    email: swmike@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux