> -----Original Message----- > From: Igor Fedotov [mailto:ifedotov@xxxxxxxxxxxx] > Sent: Thursday, March 31, 2016 9:27 AM > To: Sage Weil <sage@xxxxxxxxxxxx>; Allen Samuels > <Allen.Samuels@xxxxxxxxxxx> > Cc: ceph-devel <ceph-devel@xxxxxxxxxxxxxxx> > Subject: Re: Adding compression/checksum support for bluestore. > > > > On 31.03.2016 1:15, Sage Weil wrote: > > On Wed, 30 Mar 2016, Allen Samuels wrote: > >> [snip] > >> > >> Time to talk about checksums. > >> > >> First let's divide the world into checksums for data and checksums > >> for metadata -- and defer the discussion about checksums for metadata > >> (important, but one at a time...) > >> > >> I believe it's a requirement that when checksums are enabled that > >> 100% of data reads must be validated against their corresponding > checksum. > >> This leads you to conclude that you must store a checksum for each > >> independently readable piece of data. > > I'm just worried about the size of metadata if we have 4k checksums > > but have to read big extents anyway; cheaper to store a 4 byte > > checksum for each compressed blob. > > But do we really need to store checksums as metadata? > What's about pre(post)fixing 4K-4(?) blob with the checksum and store this > pair to the disk. > IMO we always need checksum values along with blob data thus let's store > and read them together. > This immediately eliminates the question about the granularity and > corresponding overhead... > > Have I missed something? > If you store them inline with the data then nothing lines up on boundaries that the HW designers expect and you end up doing things like extra-copying of every data buffer. This will kill performance. If you store them in a separate place (not in metadata, not contiguous to data) then you'll have a full extra I/O that might even move the head (yikes!). Plus you'll have to deal with the RMW of these tiny things. Putting them in the metadata is really the only viable option. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html