RE: Adding compression support for bluestore.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Sage Weil [mailto:sage@xxxxxxxxxxxx]
> Sent: Wednesday, March 16, 2016 2:30 PM
> To: Allen Samuels <Allen.Samuels@xxxxxxxxxxx>
> Cc: Igor Fedotov <ifedotov@xxxxxxxxxxxx>; ceph-devel <ceph-
> devel@xxxxxxxxxxxxxxx>
> Subject: RE: Adding compression support for bluestore.
> 
> On Wed, 16 Mar 2016, Allen Samuels wrote:
> > As described earlier, we can easily afford the cost of setting
> > min_alloc_size to 4KB. I don't see any advantage in handling the
> > larger allocation sizes -- only disadvantages.
> 
> That too.  The original motivation was driven by HDD behavior: if we have a
> 4KB overwrite we're better off doing a WAL record and async overwrite that
> allocating a new 4KB extent and overfragmenting the object.  But the same
> thing can be accomplished as policy in _do_write without restricting the size
> of allocations.

Agreed. But the size of allocations affects the compression ratio too. Effectively you're rounding up to the min_alloc_size for all of you allocations. Making a bigger compression block size tends to compensate for this -- but you pay for that in the WAL/RMW stuff.

> 
> This is all assuming we get the allocator/freelist memory under control, which
> we need to do anyway.

Yes, see my previous e-mails. I believe they describe one solution (I'm sure there are others). I'm trying to hack some of that code together now, just to make sure I haven't missed anything.

Assuming that my outlined solution is essentially correct, then the min_alloc size can be fixed at 4K with no downsides. This makes the selection of the compression blocksize much easier (as you limit the interaction of parameters).

> 
> sage
> 
> 
> >
> > Allen Samuels
> > Software Architect, Fellow, Systems and Software Solutions
> >
> > 2880 Junction Avenue, San Jose, CA 95134
> > T: +1 408 801 7030| M: +1 408 780 6416 allen.samuels@xxxxxxxxxxx
> >
> >
> > > -----Original Message-----
> > > From: Sage Weil [mailto:sage@xxxxxxxxxxxx]
> > > Sent: Wednesday, March 16, 2016 2:15 PM
> > > To: Allen Samuels <Allen.Samuels@xxxxxxxxxxx>
> > > Cc: Igor Fedotov <ifedotov@xxxxxxxxxxxx>; ceph-devel <ceph-
> > > devel@xxxxxxxxxxxxxxx>
> > > Subject: RE: Adding compression support for bluestore.
> > >
> > > On Wed, 16 Mar 2016, Allen Samuels wrote:
> > > > > A potential issue with using WAL for compressed block overwrites
> > > > > is significant WAL data volume increase. IIUC currently WAL
> > > > > record can have up to 2*bluestore_min_alloc_size (i.e. 128K)
> > > > > client data per single write request
> > > > > - overlapped head and tail.
> > > > > In case of compressed blocks this will be up to
> > > > > 2*bluestore_max_compressed_block ( i.e. 8Mb ) as you can't
> > > > > simply overwrite fully overlapped extents - one should operate
> > > > > compression blocks now...
> > > > >
> > > > > Seems attractive otherwise...
> > > >
> > > > This is one of the fundamental tradeoffs with compression. When
> > > > your
> > > compression block size exceeds the minimum I/O size you either have
> > > to consume time (RMW + uncompress/recompress) or you have to
> consume
> > > space (overlapping extents). Sage's current code essentially starts
> > > out by consuming space and then assumes in the background that he'll
> > > consume time to recover the space.
> > > > Of course if you set the compression block size equal to or
> > > > smaller than the
> > > minimum I/O size you can avoid these problems -- but you create
> > > others (including poor compression, needing to track very small
> > > chunks of space,
> > > etc.) and nobody seriously believes that this is a viable alternative.
> > >
> > > My inclination would be to set min_alloc_size to something smallish
> > > (if not 64KB, then 32KB perhaps) and the compression_block to
> > > something also reasonable (256KB or 512KB at most).  That means you
> > > lose some of the savings (on average, 1/2 of min_alloc_size) which
> > > is more significant if compression_block is not >> min_alloc_size,
> > > but it avoids the expensive r/m/w cases and big read + decompress for a
> small read request...
> > >
> > > sage
> >
> >
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux