RE: Adding compression/checksum support for bluestore.

Allen Samuels <Allen.Samuels@xxxxxxxxxxx> · Fri, 8 Apr 2016 23:16:37 +0000

> -----Original Message-----
> From: Chris Dunlop [mailto:chris@xxxxxxxxxxxx]
> Sent: Thursday, April 07, 2016 2:52 AM
> To: Allen Samuels <Allen.Samuels@xxxxxxxxxxx>
> Cc: Sage Weil <sage@xxxxxxxxxxxx>; Igor Fedotov
> <ifedotov@xxxxxxxxxxxx>; ceph-devel <ceph-devel@xxxxxxxxxxxxxxx>
> Subject: Re: Adding compression/checksum support for bluestore.
> 
> On Thu, Apr 07, 2016 at 12:59:45PM +1000, Chris Dunlop wrote:
> > On Thu, Apr 07, 2016 at 12:52:48AM +0000, Allen Samuels wrote:
> > > So, what started this entire thread was Sage's suggestion that for
> > > HDD we would want to increase the size of the block under
> > > management. So if we assume something like a 32-bit checksum on a
> > > 128Kbyte block being read from 5ZB Then the odds become:
> > >
> > > 1 - (2^-32 * (1-(10^-15))^(128 * 8 * 1024) - 2^-32 + 1) ^ ((5 * 8 *
> > > 10^21) / (4 * 8 * 1024))
> > >
> > > Which is
> > >
> > > 0.257715899051042299960931575773635333355380139960141052927
> > >
> > > Which is 25%. A big jump ---> That's my point :)
> >
> > Oops, you missed adjusting the second checksum term, it should be:

Merde. Right.

> >
> > 1 - (2^-32 * (1-(10^-15))^(128 * 8 * 1024) - 2^-32 + 1) ^ ((5 * 8 *
> > 10^21) / (128 * 8 * 1024)) =
> > 0.009269991973796787500153031469968391191560327904558440721
> >
> > ...which is different to the 4K block case starting at the 12th digit. I.e. not
> very different.
> 
> Oh, that's interesting, I didn't notice this before... truncating the results at
> the 12th decimal:
> 
> 0.009269991978 4K blocks
> 0.009269991973 128K blocks
> 
> ...we see the probability of getting bad data is slightly _higher_ with 4K blocks
> than with 128K blocks. I suspect this is because:

Yes, my analysis was incomplete because I was only looking at the error rate on a per I/O basis and not the effects at the system level of multiple I/O operations.

Yes, as you increase the block size, you approach the checksum limit, i.e., 2^-32 as the probability.

If we set D = N (in our example), the result becomes the checksum silent error rate. Basically, we're doing on one I/O which is almost certain to have error, but we have only a 2^-32 of a silent one slipping through, that's about 10^-10, much lower than .00926.....

> 
> On Fri, Apr 01, 2016 at 04:28:38PM +1100, Chris Dunlop wrote:
> > In fact, if you have a stream of data subject to some BER and split
> > into checksummed blocks, the larger the blocks and thereby the lower
> > the number of blocks, the lower the chance of a false match.
> 
> Chris
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html