On Mon, 4 Apr 2016, Allen Samuels wrote: > But there's an approximation that gets the job done for us. > > When U is VERY SMALL (this will always be true for us :)). > > The you can approximate 1-(1-U)^D as D * U. (for even modest values of > U (say 10-5), this is a very good approximation). > > Now the math is easy. > > The odds of failure for reading a block of size D is now D * U, with > checksum correction it becomes (D * U) / (2^C). > > It's now clear that if you double the data size, you need to add one bit > to your checksum to compensate. > > (Again, the actual math is less than 1 bit, but in the range we care > about 1 bit will always do it). > > Anyways, that's what we worked out. D = block size, U = hw UBER, C = checksum. Let's add N = number of bits you actually want to read. In that case, we have to read (N / D) blocks of D bits, and we get P(reading N bits and getting some bad data and not knowing it) = (D * U) / (2^C) * (N / D) = U * N / 2^C and the D term (block size) disappears. IIUC this is what Chris was originally getting at. The block size affects the probability I get an error on one block, but if I am a user reading something, you don't care about block size--you care about how much data you want to read. I think in that case it doesn't really matter (modulo rounding error, minimum read size, how precisely we can locate the error, etc.). Is that right? sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html