RE: Adding compression/checksum support for bluestore.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Igor Fedotov [mailto:ifedotov@xxxxxxxxxxxx]
> Sent: Thursday, March 31, 2016 10:18 AM
> To: Allen Samuels <Allen.Samuels@xxxxxxxxxxx>; Sage Weil
> <sage@xxxxxxxxxxxx>
> Cc: ceph-devel <ceph-devel@xxxxxxxxxxxxxxx>
> Subject: Re: Adding compression/checksum support for bluestore.
> 
> 
> 
> On 31.03.2016 19:32, Allen Samuels wrote:
> >> But do we really need to store checksums as metadata? What's about
> >> pre(post)fixing 4K-4(?) blob with the checksum and store this pair to
> >> the disk. IMO we always need checksum values along with blob data
> >> thus let's store and read them together. This immediately eliminates
> >> the question about the granularity and corresponding overhead... Have
> >> I missed something?
> > If you store them inline with the data then nothing lines up on boundaries
> that the HW designers expect and you end up doing things like extra-copying
> of every data buffer. This will kill performance.
> 
> Perhaps you are right.
> 
> But not sure I fully understand what HW designers you mean here. Are you
> considering the case when Ceph is embedded into some hardware and
> incoming RW requests  always operate aligned data and supposed to have
> the same alignment for data saved to disk?

Dig into the direct I/O stuff. You'll see all sorts of places where the data is required to be either 512-byte or page-aligned. This stems from the HW implementations of the HBA, SCSI, SATA HW. 

> 
> IMHO proper data alignment in the incoming requests is a particular
> case. Generally we don't have such a trait. Moreover compression
> completely destroys it if any. Thus in many cases we can easily append
> an additional data portion containing a checksum.
> 
> >
> > If you store them in a separate place (not in metadata, not contiguous to
> data) then you'll have a full extra I/O that might even move the head
> (yikes!). Plus you'll have to deal with the RMW of these tiny things.
> Agree - that's not an option.
> > Putting them in the metadata is really the only viable option.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux