So, what started this entire thread was Sage's suggestion that for HDD we would want to increase the size of the block under management. So if we assume something like a 32-bit checksum on a 128Kbyte block being read from 5ZB Then the odds become: 1 - (2^-32 * (1-(10^-15))^(128 * 8 * 1024) - 2^-32 + 1) ^ ((5 * 8 * 10^21) / (4 * 8 * 1024)) Which is 0.257715899051042299960931575773635333355380139960141052927 Which is 25%. A big jump ---> That's my point :) So if you increase the checksum size by 5 bits you'll get back to where you were: 1 - (2^-37 * (1-(10^-15))^(128 * 8 * 1024) - 2^-37 + 1) ^ ((5 * 8 * 10^21) / (4 * 8 * 1024)) 0.009269991973796787499061899655143549844043842016480142301 Allen Samuels Software Architect, Fellow, Systems and Software Solutions 2880 Junction Avenue, San Jose, CA 95134 T: +1 408 801 7030| M: +1 408 780 6416 allen.samuels@xxxxxxxxxxx > -----Original Message----- > From: Chris Dunlop [mailto:chris@xxxxxxxxxxxx] > Sent: Wednesday, April 06, 2016 5:43 PM > To: Allen Samuels <Allen.Samuels@xxxxxxxxxxx> > Cc: Sage Weil <sage@xxxxxxxxxxxx>; Igor Fedotov > <ifedotov@xxxxxxxxxxxx>; ceph-devel <ceph-devel@xxxxxxxxxxxxxxx> > Subject: Re: Adding compression/checksum support for bluestore. > > On Wed, Apr 06, 2016 at 06:06:25PM +0000, Allen Samuels wrote: > >>>> On the bright side, if the 10^-9 formula (Chris #1, Chris #2, Sage) > >>>> are anywhere near correct, they indicate with a block size of 4K > >>>> and 32-bit checksum, you'd need to read 5 * 10^21 bits, or 0.5 ZB, > >>>> to get to a 1% chance of seeing unflagged bad data, e.g.: > >> > >> P(bad data) @ U=10^-15, C=32, D=(4 * 8 * 1024), N=(5 * 8 * 10^21) > >> = 1 - (2^-C * (1-U)^D - 2^-C + 1) ^ (N / D) > >> = 1 - (2^-32 * (1-(10^-15))^(4 * 8 * 1024) - 2^-32 + 1) ^ ((5 * 8 * 10^21) / (4 > * 8 * 1024)) > >> = 0.009269991978483162962573463579660791470065102520727107106 > >> = 0.92% > > > > Where does 10^21 come into the equation? I thought we were dealing > with 5PB. > > ZB rather than PB. But I see my explanatory text still doesn't match the > numbers actually being plugged into the formula. Sigh. > > Corrected explanatory text: with a 4KB block size, 32 bit checksum and BER 1 > x 10^-15, 5 ZB of data gets you close to a 1% chance of seeing unflagged bad > data. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html