On Thu, May 19, 2016 at 08:27:02PM +0300, Igor Fedotov wrote: > Hi cephers, > > please find my initial proposal with regard to bluestore compression > control and related statistics. > > Any comments/thoughts are highly appreciated. > > ================================================================== > > COMPRESSION CONTROL OPTIONS > > One can see following means to control compression at BlueStore level. > > 1) Per-store setting to enable/disable compression and specify > default compression method > > bluestore_compression = <zlib | snappy> / <force | optional | disable> > > E.g. > > bluestore_compression = zlib/force > > The first token denotes default/applied compression algorithm. > The second one: > > 'force' - enables compression for any objects > > 'optional' - burden the caller with the need to enable compression > by different means (see below) > > 'disable' - unconditionally disables any compression for the store. > > This option is definitely useful for testing/debugging and has > probably limited use in production. If one uses Ceph for storage of pre-compressed data, having an option to disable additional (Ceph-side) compression would be desireable, at least on per-Ceph level, but at least per-pool setting would be better. Regarding optional - see below. > 2) Per-object compression specification. One should be able to > enable/disable compression for specific object. > > Following sub-option can be provided: > > a) Specify compression mode (along with disablement option) at > object creation > > b) Specify compression mode at arbitrary moment via specific > method/ioctl call. Compression to be applied for subsequent write > requests > > c) Force object compression/decompression at arbitrary moment via > specific method/ioctl call. Existing object content to be > compressed/decompressed and appropriate mode to be set for > subsequent write requests. > > d) Disable compression for short-lived objects if corresponding > hint has been provided via set_alloc_hint2 call. See PR at https://github.com/ceph/ceph/pull/6208/files/306c5e148cd2f538b3b6c8c2a1a3d5f38ef8e15a#r63775941 > > Along with specific compression algorithm one should be able to > specify default algorithm selection. E.g. user can specify 'default' > compression for an object instead of specific 'zlib' or 'snappy' > value. > > This way one can avoid the need to care about the proper algorithm > selection for each object. > > Default algorithm to be taken from the store setting (see above) > > Such an option provides pretty good level of flexibility. Upper > level can introduce additional logic to control compression this > way, e.g. enable/disable it for specific pools or dynamically > control depending on how compressible object content is. I would also add ability to set minimum acceptable compression ratio, with at least two options (any and no-expand). "Any" would store compressed objects regardless how well they've compressed and "No-expand" would store object in compressed format only if compressed size is smaller than uncompressed size. For zlib, this is more than possible (see "Maximum expansion factor" at http://www.zlib.net/zlib_tech.html) and storing doubly-compressed data will yield higher cpu and memory usage while accessing object *and* more storage being utilized. Additional option (set in percentage or bytes) specifying actual minimum acceptable compression ratio would improve on this idea further, and for example, improve read performance on large images (tens of gigabytes) that were compressed by only few hundred megabytes. > 3) Per-write request compression control. > > This option provides the highest level of flexibility but is > probably an overkill. > > Any rationales to have it? See above. If we're going to have per-block compression flag, then writing compressed format data only if the compression actually shrunk the data would improve read performance later. -- Piotr Dałek branch@xxxxxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html