On Fri, Apr 01, 2016 at 07:51:07PM -0700, Gregory Farnum wrote: > On Fri, Apr 1, 2016 at 7:23 PM, Allen Samuels <Allen.Samuels@xxxxxxxxxxx> wrote: >> Talk about mental failures. The first statement is correct. It's about the ratio of checksum to data bits. After that please ignore. If you double the data you need to double the checksum bit to maintain the ber. > > Forgive me if I'm wrong here — I haven't done anything with > checksumming since I graduated college — but good checksumming is > about probabilities and people suck at evaluating probability: I'm > really not sure any of the explanations given in this thread are > right. Bit errors aren't random and in general it requires a lot more > than one bit flip to collide a checksum, so I don't think it's a > linear relationship between block size and chance of error. Finding A single bit flip can certainly result in a checksum collision, with the same chance as any other error, i.e. 1 in 2^number_of_checksum_bits. Just to clarify: the chance of encountering an error is linear with the block size. I'm contending the chance of encountering a checksum collision as a result of encountering one or more errors is independent of the block size. > collisions with cryptographic hashes is hard! Granted a CRC is a lot > simpler than SHA1 or whatever, but we also aren't facing adversaries > with it, just random corruption. So yes, as your data block increases > then naturally the number of possible bit patterns which match the > same CRC have to increase — but that doesn't mean your odds of > actually *getting* that bit pattern by mistake increase linearly. A (good) checksum is like rolling a 2^"number of bits in the checksum"-sided dice across an rough table, in an ideal world where every single parameter is known. If you launch your dice in precisely the same way, the dice will behave exactly the same way, hitting the same hills and valleys in the table, and end up in precisely the same spot with precisely the same face on top - your checksum. The number of data bits is how hard you roll the dice: how far it goes and many hills and valleys it hits along the way. One or more data errors (bit flips or whatever) is then equivalent to changing one or more of the hills or valleys: a very small difference, but encountering the difference puts the dice on a completely different path, thereafter hitting completely different hills and valleys to the original path. And which face is on top when your dice stops is a matter of chance (well... not really: if you did exactly the same again, it would end up taking precisely the same path and the same face would be on top when it stops). The thing is, it doesn't matter how many hills and valleys (data bits) it hits along the way: the chance of getting a specific face up is always the same, i.e. 1 / number_of_faces == 1 / 2^number_of_checksum_bits. Chris -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html