Re: Initial proposal for bluestore compression control and statistics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 19.05.2016 20:57, Piotr Dałek wrote:
On Thu, May 19, 2016 at 08:27:02PM +0300, Igor Fedotov wrote:
Hi cephers,

please find my initial proposal with regard to bluestore compression
control and related statistics.

Any comments/thoughts are highly appreciated.

==================================================================

COMPRESSION CONTROL OPTIONS

One can see following means to control  compression at BlueStore level.

1) Per-store setting to enable/disable compression and specify
default compression method

bluestore_compression = <zlib | snappy> / <force | optional | disable>

E.g.

bluestore_compression = zlib/force

The first token denotes default/applied compression algorithm.
The second one:

'force' - enables compression for any objects

'optional' - burden the caller with the need to enable compression
by different means (see below)

'disable' - unconditionally disables any compression for the store.

This option is definitely useful for testing/debugging and has
probably limited use in production.
If one uses Ceph for storage of pre-compressed data, having an option to
disable additional (Ceph-side) compression would be desireable, at least on
per-Ceph level, but at least per-pool setting would be better.
Regarding optional - see below.

2) Per-object compression specification. One should be able to
enable/disable compression for specific object.

Following sub-option can be provided:

   a) Specify compression mode (along with disablement option) at
object creation

   b) Specify compression mode at arbitrary moment via specific
method/ioctl call. Compression to be applied for subsequent write
requests

   c) Force object compression/decompression at arbitrary moment via
specific method/ioctl call. Existing object content to be
compressed/decompressed and appropriate mode to be set for
subsequent write requests.

   d) Disable compression for short-lived objects if corresponding
hint has been provided via set_alloc_hint2 call. See PR at https://github.com/ceph/ceph/pull/6208/files/306c5e148cd2f538b3b6c8c2a1a3d5f38ef8e15a#r63775941

Along with specific compression algorithm one should be able to
specify default algorithm selection. E.g. user can specify 'default'
compression for an object instead of specific 'zlib' or 'snappy'
value.

This way one can avoid the need to care about the proper algorithm
selection for each object.

Default algorithm to be taken from the store setting (see above)

Such an option provides pretty good level of flexibility. Upper
level can introduce additional logic to control compression this
way, e.g. enable/disable it for specific pools or dynamically
control depending on how compressible object content is.
I would also add ability to set minimum acceptable compression ratio,
with at least two options (any and no-expand). "Any" would store compressed
objects regardless how well they've compressed and "No-expand" would store
object in compressed format only if compressed size is smaller than
uncompressed size.
Why do we need "Any" option? Isn't "No-expand" enough?
  For zlib, this is more than possible (see "Maximum
expansion factor" at http://www.zlib.net/zlib_tech.html) and storing
doubly-compressed data will yield higher cpu and memory usage while
accessing object *and* more storage being utilized. Additional option (set
in percentage or bytes) specifying actual minimum acceptable compression
ratio would improve on this idea further, and for example, improve read
performance on large images (tens of gigabytes) that were compressed by only
few hundred megabytes.
Sounds good.

3) Per-write request compression control.

This option provides the highest level of flexibility but is
probably an overkill.

Any rationales to have it?
See above. If we're going to have per-block compression flag, then writing
compressed format data only if the compression actually shrunk the data
would improve read performance later.


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux