RE: Adding compression/checksum support for bluestore.

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 5 Apr 2016 08:35:43 -0400 (EDT)

On Mon, 4 Apr 2016, Allen Samuels wrote:
> But there's an approximation that gets the job done for us.
> 
> When U is VERY SMALL (this will always be true for us :)).
> 
> The you can approximate 1-(1-U)^D as D * U.  (for even modest values of 
> U (say 10-5), this is a very good approximation).
> 
> Now the math is easy.
> 
> The odds of failure for reading a block of size D is now D * U, with 
> checksum correction it becomes (D * U) / (2^C).
>
> It's now clear that if you double the data size, you need to add one bit 
> to your checksum to compensate.
> 
> (Again, the actual math is less than 1 bit, but in the range we care 
> about 1 bit will always do it).
> 
> Anyways, that's what we worked out.

D = block size, U = hw UBER, C = checksum.  Let's add N = number of bits 
you actually want to read.  In that case, we have to read (N / D) blocks 
of D bits, and we get

P(reading N bits and getting some bad data and not knowing it)
	= (D * U) / (2^C) * (N / D)
	= U * N / 2^C

and the D term (block size) disappears.  IIUC this is what Chris was 
originally getting at.  The block size affects the probability I get an 
error on one block, but if I am a user reading something, you don't care 
about block size--you care about how much data you want to read.  I think 
in that case it doesn't really matter (modulo rounding error, minimum read 
size, how precisely we can locate the error, etc.).

Is that right?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html