Re: Adding compression/checksum support for bluestore.

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 1 Apr 2016 19:51:07 -0700

On Fri, Apr 1, 2016 at 7:23 PM, Allen Samuels <Allen.Samuels@xxxxxxxxxxx> wrote:
> Talk about mental failures. The first statement is correct. It's about the ratio of checksum to data bits. After that please ignore. If you double the data you need to double the checksum bit to maintain the ber.

Forgive me if I'm wrong here — I haven't done anything with
checksumming since I graduated college — but good checksumming is
about probabilities and people suck at evaluating probability: I'm
really not sure any of the explanations given in this thread are
right. Bit errors aren't random and in general it requires a lot more
than one bit flip to collide a checksum, so I don't think it's a
linear relationship between block size and chance of error. Finding
collisions with cryptographic hashes is hard! Granted a CRC is a lot
simpler than SHA1 or whatever, but we also aren't facing adversaries
with it, just random corruption. So yes, as your data block increases
then naturally the number of possible bit patterns which match the
same CRC have to increase — but that doesn't mean your odds of
actually *getting* that bit pattern by mistake increase linearly.

I spent a brief time trying to read up on Hamming distances and
"minimum distance separable" to try and remember/understand this and
it's just making my head hurt, so hopefully somebody with the right
math background can chime in.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html