On Fri, 1 Apr 2016, Chris Dunlop wrote: > On Wed, Mar 30, 2016 at 10:52:37PM +0000, Allen Samuels wrote: > > One thing to also factor in is that if you increase the span of a > > checksum, you degrade the quality of the checksum. So if you go with 128K > > chunks of data you'll likely want to increase the checksum itself from > > something beyond a CRC-32. Maybe somebody out there has a good way of > > describing this quanitatively. > > I would have thought the "quality" of a checksum would be a function of how > many bits it is, and how evenly and randomly it's distributed, and unrelated > to the amount of data being checksummed. > > I.e. if you have any amount of data covered by an N-bit evenly randomly > distributed checksum, and "something" goes wrong with the data (or the > checksum), the chance of the checksum still matching the data is 1 in 2^n. Say there is some bit error rate per bit. If you double the amount of data you're checksumming, then you'll see twice as many errors. That means that even though your 32-bit checksum is right 2^32-1 times out of 2^32, you're twice as likely to hit that 1 in 2^32 chance of getting a correct checksum on wrong data. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html