Re: Adding compression/checksum support for bluestore.

Chris Dunlop <chris@xxxxxxxxxxxx> · Tue, 5 Apr 2016 01:00:42 +1000

[ Apologies for the tardy continuation of this... ]

On Sat, Apr 02, 2016 at 05:38:23AM +0000, Allen Samuels wrote:
>> -----Original Message-----
>> From: Chris Dunlop [mailto:chris@xxxxxxxxxxxx]
>> 
>> What I think we're talking about is, if you have an error, the probability of the
>> checksum unfortunately still matching, i.e. a false positive: you think you
>> have good data but it's actually crap. And that's a function of the number of
>> checksum bits, and unrelated to the number of data bits. Taken to extremes,
>> you could have a 5PB data system covered by a single 32 bit checksum, and
>> that's no more or less likely to give you a false positive than a 32 bit checksum
>> on a 4K block, and it doesn't change the BER at all.
>> 
>> That said, if you have to throw away 5PB of data and read it again because of
>> a checksum mismatch, your chances of getting another error during the
>> subsequent 5PB read are obviously much higher than your chances of getting
>> another error during a subsequent 4KB read.
>> 
>> Perhaps this is the effect you're thinking about? However this effect is a
>> function of the block size (== data bits) and is independent of the checksum
>> size.
> 
> Here's how I'm thinking about it -- for better or for worse.
> 
> The goal of the checksumming system is to be able to complete a read operation and to be able to make a statement about of the odds of the entire read chunk of data being actually correct, i.e., the odds of having zero undetectable uncorrected bit errors for each bit of data in the read operation (UUBER).

Agree.

> A fixed size checksum reduces the odds for the entire operation by a fixed amount (i.e., 2^checksum-size assuming a good checksum algorithm).

Agree.

> However, the HW UUBER is a per-bit quotation.

Agree.

> Thus as you read more bits the odds of a UUBER go UP by the number of bits that you're reading.

Disagree. The UUBER stays the same: it's a _rate_. The odds of an error
certainly go up, but the error rate is still the same.

So then:

> If we hold the number of checksum bits constant then this increased HW UUBER is only decreased by a fixed amount, Hence the net UUBER is degraded for larger reads as opposed to smaller reads (again for a fixed-size checksum).
>
> As you point out the odds of a UUBER for a 5PB read are much higher than the odds of a UUBER for a 4K read. My goal is to level that so that ANY read (i.e., EVERY read) has a UUBER that's below a fixed limit.
> 
> That's why I believe that you have to design for a specific checksum-bit / data bit ratio.
> 
> Does this make sense to you?

Sorry, no, it doesn't :-(

On Sat, Apr 02, 2016 at 05:48:01AM +0000, Allen Samuels wrote:
> I think you're defining BER as the odds of a read operation silently
> delivering wrong data. Whereas I'm defining BER as the odds of an
> individual bit being read incorrectly. When we have a false positive you
> count "1" failure but I count "Block" number of failures.
>
> I'm not claiming that either of us is "correct". I'm just trying
> understand our positions.
>
> Do you agree with this?

I agree BER is the odds of an individual bit being read incorrectly
(actually, for our purposes "delivered" rather than "read", taking into
account potential errors in the transmission path). So, yes, the more bits
in a block of data, the more likely you'll see one or more errors. (But the
size of the block still doesn't change the BER, i.e. the chance of any
specific bit being flipped.)

However, my point is, given one or more errors in a block of data, the
chance of a checksum returning a false positive is a function of the size of
the checksum and independent of the size of the data (and independent of the
BER).

Sure, as Sage said, the larger the block of data (and/or the larger the
BER), the more likely you are to see an error, so the more likely you are to
get unlucky and get a false positive in that block.

However, for a given amount of data you have the same overall chance of
getting errors (e.g. 5PB @ BER 1x10^-15 == 5 * 8 * 10^15 * 1 * 10^-15 == 40
bit errors, on average) whether you're reading in small or large blocks, so
once again we're back to the size of the checksum and not the size of the
block telling us how likely it is we'll get a false positive when we get
those errors.

Chris

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html