RE: Adding compression/checksum support for bluestore.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Igor Fedotov [mailto:ifedotov@xxxxxxxxxxxx]
> Sent: Thursday, March 31, 2016 9:27 AM
> To: Sage Weil <sage@xxxxxxxxxxxx>; Allen Samuels
> <Allen.Samuels@xxxxxxxxxxx>
> Cc: ceph-devel <ceph-devel@xxxxxxxxxxxxxxx>
> Subject: Re: Adding compression/checksum support for bluestore.
> 
> 
> 
> On 31.03.2016 1:15, Sage Weil wrote:
> > On Wed, 30 Mar 2016, Allen Samuels wrote:
> >> [snip]
> >>
> >> Time to talk about checksums.
> >>
> >> First let's divide the world into checksums for data and checksums
> >> for metadata -- and defer the discussion about checksums for metadata
> >> (important, but one at a time...)
> >>
> >> I believe it's a requirement that when checksums are enabled that
> >> 100% of data reads must be validated against their corresponding
> checksum.
> >> This leads you to conclude that you must store a checksum for each
> >> independently readable piece of data.
> > I'm just worried about the size of metadata if we have 4k checksums
> > but have to read big extents anyway; cheaper to store a 4 byte
> > checksum for each compressed blob.
> 
> But do we really need to store checksums as metadata?
> What's about pre(post)fixing 4K-4(?) blob with the checksum and store this
> pair to the disk.
> IMO we always need checksum values along with blob data thus let's store
> and read them together.
> This immediately eliminates the question about the granularity and
> corresponding overhead...
> 
> Have I missed something?
> 

If you store them inline with the data then nothing lines up on boundaries that the HW designers expect and you end up doing things like extra-copying of every data buffer. This will kill performance.

If you store them in a separate place (not in metadata, not contiguous to data) then you'll have a full extra I/O that might even move the head (yikes!). Plus you'll have to deal with the RMW of these tiny things.

Putting them in the metadata is really the only viable option.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux