RE: Adding compression/checksum support for bluestore.

Allen Samuels <Allen.Samuels@xxxxxxxxxxx> · Sat, 2 Apr 2016 05:08:50 +0000

> -----Original Message-----
> From: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx]
> Sent: Friday, April 01, 2016 7:51 PM
> To: Allen Samuels <Allen.Samuels@xxxxxxxxxxx>
> Cc: Chris Dunlop <chris@xxxxxxxxxxxx>; Sage Weil <sage@xxxxxxxxxxxx>;
> Igor Fedotov <ifedotov@xxxxxxxxxxxx>; ceph-devel <ceph-
> devel@xxxxxxxxxxxxxxx>
> Subject: Re: Adding compression/checksum support for bluestore.
> 
> On Fri, Apr 1, 2016 at 7:23 PM, Allen Samuels <Allen.Samuels@xxxxxxxxxxx>
> wrote:
> > Talk about mental failures. The first statement is correct. It's about the ratio
> of checksum to data bits. After that please ignore. If you double the data you
> need to double the checksum bit to maintain the ber.
> 
> Forgive me if I'm wrong here — I haven't done anything with checksumming
> since I graduated college — but good checksumming is about probabilities
> and people suck at evaluating probability: I'm really not sure any of the
> explanations given in this thread are right. Bit errors aren't random and in
> general it requires a lot more than one bit flip to collide a checksum, so I don't
> think it's a linear relationship between block size and chance of error. Finding
> collisions with cryptographic hashes is hard! Granted a CRC is a lot simpler
> than SHA1 or whatever, but we also aren't facing adversaries with it, just
> random corruption. So yes, as your data block increases then naturally the
> number of possible bit patterns which match the same CRC have to increase
> — but that doesn't mean your odds of actually *getting* that bit pattern by
> mistake increase linearly.

You are correct that bit errors aren't actually random and it's important that your analysis take this into account. It's my understanding that CRC family of codes targets a correlated burst error mode that is appropriate for a bit-serial media like Ethernet or a Hard Drive. It's not a good code for a parallel media like Flash (which has a very different set of error distributions).

Our problem is a bit different as we're operating downstream of the error correcting code on the underlying hardware. That analysis is yet again different and waaaaay out of my league.

Essentially our problem is to look at the pattern of undetected AND uncorrected errors that come after the unknown media correcting code. While I'm sure that this error isn't random -- I'll bet that using a random approximation model isn't too far off from accurate.

Typical enterprise class storage drives provide about a 10e-15 UBER (uncorrectable bit error rate). The undetectable uncorrectable (UUBER?) rate will be somewhat less than that -- but we can use this value as an upper bound. Then apply the extra protection of whatever number of checksum bits we're providing and then maybe guardband that by another .5 to 1 order of magnitude and I think you'll have a UUBER that you can feel pretty good about (for the combination checksum and HW combination).

I will get some more information from the actual ECC experts and try to generate some actual computations here. Stay tuned....

> 
> I spent a brief time trying to read up on Hamming distances and "minimum
> distance separable" to try and remember/understand this and it's just
> making my head hurt, so hopefully somebody with the right math
> background can chime in.
> -Greg
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f