Re: Questions about bitrot and RAID 5/6

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Thu, 30 Jan 2014 13:59:54 -0700

On Jan 30, 2014, at 3:22 AM, Mikael Abrahamsson <swmike@xxxxxxxxx> wrote:

> On Mon, 27 Jan 2014, Chris Murphy wrote:
> 
>>> This is your interpretation of the claim. My interpretation of the data is that you will get an URE for every 10^14 bits read.
>> 
>> How does your interpretation differ if the < sign is removed from the stated spec? Linguistically, how do you alter the description if that symbol is and isn't present?
> 
> THe spec is an SLA. THe manufacturer will try to beat that number to keep the SLA. Sometimes they're a lot better, sometimes they're worse and then they have to compensate the customer.

The spec being an agreement that activates a warrantied replacement is a plausible argument. And I agree with the characterization that a particular drive may be perform better or worse.

But the spec says "less than 1 per 1E14 bits" not "less than or equal to". If we actually get 1 URE in 1E14 bits read, that's busting the spec. An average of 1 URE in 1E14 bits read is likewise busting the spec. And if that were an average across a population it would mean drive manufacturers are on the hook for some 50% of their drives being replaced, and we know that is definitely not happening.

And I'm not reading this as "the first time you get 2 UREs in less than 1E14 read" you've hit the SLA, although someone could possibly make that argument.

I'd say overwhelmingly drives are performing a lot better than this spec. The idea we should expect a URE on average just by reading a 4TB drive three times in a row makes no sense. People would be having all sorts of problems, which they aren't. And if it really were a mean, this should be readily reproducible yet it isn't. In 12x full reads of three 3TB drives, zero UREs. That's over 100TB read without a URE. And these are ~2 year old drives.

> 
>> And we also know that the size of the packet does affect error rates, just not within an order of magnitude, such is also the case with HDDs between conventional and AF disks. But the allowance of up to but not including an order of magnitude is necessarily implied by the less than sign or it wouldn't be there. It's a continuum, it's not a statement of what will happen on average. It's a statement that error will occur but won't exceed X errors in Y bits.
> 
> If you run the connection full, the packet size doesn't affect the bit error rate, only the result of the bit error.

Packet size doesn't affect raw bit error rate, it does affect the packet error rate. Bigger packets means a higher packet error rate. The URE is argued to be "errors per bits read" not "bit errors per bits read" so comparing URE to BER is mixing units. 

The URE is more analogous to packet error rate. The limitation with that comparison is that network CRC is the same regardless of packet size. Whereas the ECC in 512 byte and 4096 byte sector drives is not the same.

> 
>>> Where exactly in the available data does it say that?
>> 
>> Drive manufacturers saying AF disks means better error correction both because of the larger size of sectors, and also more ECC bits.
>> 
>> http://www.snia.org/sites/default/files2/SDC2011/presentations/wednesday/CurtisStevens_Advanced_Format_Legacy.pdf
>> 
>> http://storage.toshiba.eu/export/sites/toshiba-sdd/media/downloads/advanced_format/4KWhitePaper_TEG.pdf
>> ", if the data field to be protected in each sector is larger than 512 bytes, the ECC algorithm could be improved to correct for a higher number of bits in error"
>> 
>> http://www.idema.org/wp-content/plugins/download-monitor/download.php?id=1244
>> page 3, 2nd figure "errors per 512 bytes vs physical block size"
>> 
>> There's loads of information on this…
> 
> The 4k sector design is an internal design means to achieve the specified SLA. So while 4k ECC is better, the manufacturer might use a higher density with a higher bit error rate, but which end result is still within the offered SLA because of better error correction method.
> 
> So we're back to what the 10^-14 means. This is all you have to go on, because internally the manufacturer is free to use 512b sector size, 4k sector size, or pixie dust to achieve the specs they're offering the end customer. There is nothing that says that you as a customer gets to partake in any improvement due to internal changes within the unit.

Agreed, insofar as we only knowt the max error rate anticipated by the spec. We do not know the average occurrence based on the spec. To compute that we need a scientific sample of drives, with all of the drives producing error rates greater than 1 URE in 1E14 bits discarded. An unweighted average would be useless because such drives should trigger a warranty replacement. And I don't know of any published studies that have done that - presumably this has been done by drive manufacturers though.

Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html