Re: Questions about bitrot and RAID 5/6

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Mon, 27 Jan 2014 11:20:50 -0700

On Jan 27, 2014, at 12:16 AM, Mikael Abrahamsson <swmike@xxxxxxxxx> wrote:

> On Sun, 26 Jan 2014, Chris Murphy wrote:
> 
>> I accept that. But since there are two sector sizes, one URE does not represent the same amount of data loss.
> 
> This is your interpretation of the claim. My interpretation of the data is that you will get an URE for every 10^14 bits read.

How does your interpretation differ if the < sign is removed from the stated spec? Linguistically, how do you alter the description if that symbol is and isn't present?

> How many data bits that are lost by this URE is not important, either you lose 512 bytes or 4096 bytes. The bit error rate is still calcluated on total amount read, regardless of how many bits are when this URE happens.
> 
> In data communication (ethernet for instance), we say the bit error rate is 10^-12. When you get a bit error, you're going to lose the entire packet. The size of the packet doesn't count in the error rate calculation. I don't see why HDDs would be different.

This is a really good point. Data communication standards are more readily published and must be interoperable. But even the network specs describe BER in "less than X in Y" or "X in Y, max" terms, not as an error occurring every Y bits.

And we also know that the size of the packet does affect error rates, just not within an order of magnitude, such is also the case with HDDs between conventional and AF disks. But the allowance of up to but not including an order of magnitude is necessarily implied by the less than sign or it wouldn't be there. It's a continuum, it's not a statement of what will happen on average. It's a statement that error will occur but won't exceed X errors in Y bits.

> The bit error rate is one thing, the consequence of the bit error and how much data is lost is another thing. You insist that these are directly coupled.

No, I'm saying that the actual break down of bits lost translates to an irrational consequence when reading the spec as if there isn't a less than symbol present.

> 
>> Fine, but if you accept that the probability of URE is the same between conventional and AF disks, you accept that there are more bits being lost on AF disks than conventional disks. Yet the available data says the opposite is true.
> 
> Where exactly in the available data does it say that?

Drive manufacturers saying AF disks means better error correction both because of the larger size of sectors, and also more ECC bits.

http://www.snia.org/sites/default/files2/SDC2011/presentations/wednesday/CurtisStevens_Advanced_Format_Legacy.pdf

http://storage.toshiba.eu/export/sites/toshiba-sdd/media/downloads/advanced_format/4KWhitePaper_TEG.pdf
", if the data field to be protected in each sector is larger than 512 bytes, the ECC algorithm could be improved to correct for a higher number of bits in error"

http://www.idema.org/wp-content/plugins/download-monitor/download.php?id=1244
page 3, 2nd figure "errors per 512 bytes vs physical block size"

There's loads of information on this…

Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html