RE: Adding compression/checksum support for bluestore.

Allen Samuels <Allen.Samuels@xxxxxxxxxxx> · Sat, 2 Apr 2016 05:48:01 +0000

> -----Original Message-----
> From: Chris Dunlop [mailto:chris@xxxxxxxxxxxx]
> Sent: Friday, April 01, 2016 10:05 PM
> To: Gregory Farnum <gfarnum@xxxxxxxxxx>
> Cc: Allen Samuels <Allen.Samuels@xxxxxxxxxxx>; Sage Weil
> <sage@xxxxxxxxxxxx>; Igor Fedotov <ifedotov@xxxxxxxxxxxx>; ceph-
> devel <ceph-devel@xxxxxxxxxxxxxxx>
> Subject: Re: Adding compression/checksum support for bluestore.
> 
> On Fri, Apr 01, 2016 at 07:51:07PM -0700, Gregory Farnum wrote:
> > On Fri, Apr 1, 2016 at 7:23 PM, Allen Samuels
> <Allen.Samuels@xxxxxxxxxxx> wrote:
> >> Talk about mental failures. The first statement is correct. It's about the
> ratio of checksum to data bits. After that please ignore. If you double the
> data you need to double the checksum bit to maintain the ber.
> >
> > Forgive me if I'm wrong here — I haven't done anything with
> > checksumming since I graduated college — but good checksumming is
> > about probabilities and people suck at evaluating probability: I'm
> > really not sure any of the explanations given in this thread are
> > right. Bit errors aren't random and in general it requires a lot more
> > than one bit flip to collide a checksum, so I don't think it's a
> > linear relationship between block size and chance of error. Finding
> 
> A single bit flip can certainly result in a checksum collision, with the same
> chance as any other error, i.e. 1 in 2^number_of_checksum_bits.
> 
> Just to clarify: the chance of encountering an error is linear with the block
> size. I'm contending the chance of encountering a checksum collision as a
> result of encountering one or more errors is independent of the block size.
> 
> > collisions with cryptographic hashes is hard! Granted a CRC is a lot
> > simpler than SHA1 or whatever, but we also aren't facing adversaries
> > with it, just random corruption. So yes, as your data block increases
> > then naturally the number of possible bit patterns which match the
> > same CRC have to increase — but that doesn't mean your odds of
> > actually *getting* that bit pattern by mistake increase linearly.
> 
> A (good) checksum is like rolling a 2^"number of bits in the checksum"-sided
> dice across an rough table, in an ideal world where every single parameter is
> known. If you launch your dice in precisely the same way, the dice will
> behave exactly the same way, hitting the same hills and valleys in the table,
> and end up in precisely the same spot with precisely the same face on top -
> your checksum. The number of data bits is how hard you roll the dice:
> how far it goes and many hills and valleys it hits along the way.
> 
> One or more data errors (bit flips or whatever) is then equivalent to changing
> one or more of the hills or valleys: a very small difference, but encountering
> the difference puts the dice on a completely different path, thereafter
> hitting completely different hills and valleys to the original path. And which
> face is on top when your dice stops is a matter of chance (well... not really: if
> you did exactly the same again, it would end up taking precisely the same
> path and the same face would be on top when it stops).
> 
> The thing is, it doesn't matter how many hills and valleys (data bits) it hits
> along the way: the chance of getting a specific face up is always the same, i.e.
> 1 / number_of_faces == 1 / 2^number_of_checksum_bits.
> 
> Chris

I think you're defining BER as the odds of a read operation silently delivering wrong data. Whereas I'm defining BER as the odds of an individual bit being read incorrectly. When we have a false positive you count "1" failure but I count "Block" number of failures.

I'm not claiming that either of us is "correct". I'm just trying understand our positions. 

Do you agree with this?

��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f