On Thu, Apr 07, 2016 at 12:59:45PM +1000, Chris Dunlop wrote: > On Thu, Apr 07, 2016 at 12:52:48AM +0000, Allen Samuels wrote: > > So, what started this entire thread was Sage's suggestion that for HDD we > > would want to increase the size of the block under management. So if we > > assume something like a 32-bit checksum on a 128Kbyte block being read > > from 5ZB Then the odds become: > > > > 1 - (2^-32 * (1-(10^-15))^(128 * 8 * 1024) - 2^-32 + 1) ^ ((5 * 8 * 10^21) / (4 * 8 * 1024)) > > > > Which is > > > > 0.257715899051042299960931575773635333355380139960141052927 > > > > Which is 25%. A big jump ---> That's my point :) > > Oops, you missed adjusting the second checksum term, it should be: > > 1 - (2^-32 * (1-(10^-15))^(128 * 8 * 1024) - 2^-32 + 1) ^ ((5 * 8 * 10^21) / (128 * 8 * 1024)) > = 0.009269991973796787500153031469968391191560327904558440721 > > ...which is different to the 4K block case starting at the 12th digit. I.e. not very different. Oh, that's interesting, I didn't notice this before... truncating the results at the 12th decimal: 0.009269991978 4K blocks 0.009269991973 128K blocks ...we see the probability of getting bad data is slightly _higher_ with 4K blocks than with 128K blocks. I suspect this is because: On Fri, Apr 01, 2016 at 04:28:38PM +1100, Chris Dunlop wrote: > In fact, if you have a stream of data subject to some BER and split into > checksummed blocks, the larger the blocks and thereby the lower the number > of blocks, the lower the chance of a false match. Chris -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html