RE: Adding compression/checksum support for bluestore.

Allen Samuels <Allen.Samuels@xxxxxxxxxxx> · Mon, 4 Apr 2016 17:56:47 +0000

> -----Original Message-----
> From: Chris Dunlop [mailto:chris@xxxxxxxxxxxx]
> Sent: Monday, April 04, 2016 8:27 AM
> To: Gregory Farnum <gfarnum@xxxxxxxxxx>
> Cc: Allen Samuels <Allen.Samuels@xxxxxxxxxxx>; Sage Weil
> <sage@xxxxxxxxxxxx>; Igor Fedotov <ifedotov@xxxxxxxxxxxx>; ceph-
> devel <ceph-devel@xxxxxxxxxxxxxxx>
> Subject: Re: Adding compression/checksum support for bluestore.
> 
> On Fri, Apr 01, 2016 at 11:18:05PM -0700, Gregory Farnum wrote:
> > On Fri, Apr 1, 2016 at 10:05 PM, Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
> >> On Fri, Apr 01, 2016 at 07:51:07PM -0700, Gregory Farnum wrote:
> >>> Forgive me if I'm wrong here — I haven't done anything with
> >>> checksumming since I graduated college — but good checksumming is
> >>> about probabilities and people suck at evaluating probability: I'm
> >>> really not sure any of the explanations given in this thread are
> >>> right. Bit errors aren't random and in general it requires a lot
> >>> more than one bit flip to collide a checksum, so I don't think it's
> >>> a linear relationship between block size and chance of error.
> >>> Finding
> >>
> >> A single bit flip can certainly result in a checksum collision, with
> >> the same chance as any other error, i.e. 1 in
> 2^number_of_checksum_bits.
> >
> > That's just not true. I'll quote from
> > https://en.m.wikipedia.org/wiki/Cyclic_redundancy_check#Introduction
> >
> >> Typically an n-bit CRC applied to a data block of arbitrary length will detect
> any single error burst not longer than n bits and will detect a fraction 1 −
> 2^(−n) of all longer error bursts.
> 
> Ouch, got me! :-)
> 
> In my defense, I was talking about checksums in general rather than
> specifically CRC. But no, I wasn't actually aware that CRCs provide guaranteed
> detection of single n-bit error bursts.
> 
> That changes my contention that block size doesn't matter. My contention
> was based on a (generic, good) checksum being equally good at picking up
> multiple errors as single errors. However this property of CRCs means that, if
> you have multiple errors further apart than the n-bits of a CRC checksum,
> you lose your guarantee of detection (although you still have a very, very
> good chance of detection). So in a larger block you're more likely to see
> another error further away, and thus more likely to get an undetected error.
> 
> Well, I've learnt something!
> 
> Thanks all, for your patience.
> 
> Chris

I would contend that CRCs are probably not the best algorithm for us to use.
CRCs are targeted at burst-error detection to the detriment of other error profiles.
Our error profile is almost certainly NOT a burst-error profile. 

(1) Our physical media profile will be what's NOT caught by the error correction scheme associated with that media. I don't know what that error profile looks like and it probably won't be constant across different types of HW (for example, there are several different classes of error correcting algorithms for flash -- which have very different failure profiles). 

(2) Often, the failure won't be HW, but actually SW, i.e., the entire block is just wrong. (i.e., control path error)

I suspect that a cryptographic checksum is what we want to use, i.e., something that strives to make each bit evenly contribute to the overall checksum.

But this is easily parameterizeable, and trivial to provide a number of them. Probably not worth too much discussion now.

> 
> > And over (at least) the ranges they're designed for, it's even better:
> > they provide guarantees about how many bits (in any combination or
> > arrangement) must be flipped before they can have a false match. (It
> > says "typically" because CRCs are a wide family and yes, you do have
> > to select the right ones in the right ways in order to get the desired
> > effects.)
> >
> > As Allen says, flash may require something different, but it will be
> > similar. Getting the people who actually understand this is definitely
> > the way to go — it's an active research field but I think over the
> > ranges we're interested in it's a solved problem. And certainly if we
> > try and guess about things based on our intuition, we *will* get it
> > wrong. So somebody interested in this feature set needs to go out and
> > do the reading or talk to the right people, please! :) -Greg
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f