Compression implementation options

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,

Here is a brief summary on potential compression implementation options.
I think we should choose the desired approach prior to start working on the compression feature.

Comments, additions and fixes are welcome.

Compression At Client - compression/decompression to be performed at the client level (most preferably - Rados) before sending/after receiving data to/from Ceph.
    Pros:
        * Ceph cluster isn’t loaded with additional computation burden.
* All Ceph cluster components and data transfers benefit from reduce data volume.
        * Compression is transparent to Ceph cluster components
    Cons:
        * Weak clients can lack CPU resources to handle their traffic.
* Any Read/Write access requires at least two sequential requests to Ceph cluster to get data: the first one to retrieve “original to compressed“ offset mapping for desired data block, the second one to get compressed data block. * Random write access handling is tricky (see notes below). Even more requests to the cluster per single user one might be needed in this case.

Compression At Replicated Pool - compression to be performed at primary Ceph entities at Replicated Pool level prior to data replication.
    Pros:
        * Clients benefit from cluster CPU resources utilization.
* Compression for specific data block is performed at a single point only - thus total CPU utilization for Ceph cluster is less. * Underlying Ceph components and data transfers benefit from from reduced data volume.
    Cons:
* Clients that use EC pools directly lack compression unless it’s implemented there too. * In two-tier model data compression at cache tier may be inappropriate due to performance reasons. Compression at cache tier also prevents from cache removal when/if needed.
        * Random write access handling is tricky (see notes below).

Compression At Erasure Coded pool - compression to be performed at primary Ceph entities at EC Pool level prior to Erasure Coding.
    Pros:
        * Clients benefit from cluster CPU resources utilization.
* Erasure Coding “inflates” processed data block (up to ~50%). Thus doing compression prior to that reduces CPU utilization. * Natural combination with EC means. Compression and EC have similar purposes - save storage space at the cost of CPU usage. One can reuse EC infrastructure and design solutions. * No need for random write access support - EC pools don’t provide that on its own. Thus we can reuse the same approach to resolve the issue when needed. Implementation becomes much easier. * Underlying Ceph components and data transfers benefit from reduced data volume.
    Cons:
* Limited applicability - clients that don’t use EC pools lack compression.

Compression At Ceph Filestore entity - compression to be performed by Ceph File Store component prior to saving object data to underlying file system.
    Pros:
        *Clients benefit from cluster CPU resources utilization.

    Cons:
        * Random write access is tricky (see notes below).
* From cluster perspective compression is performed either on each replicated block or on a block “inflated” by erasure coding. Thus total Ceph cluster CPU utilization to perform compression becomes considerably higher ( three times increase for replicated pools and ~50% one for EC pools).
        * No benefit in reduced data transfers over the net.
* Recovery procedure caused by OSD down triggers complete data set decompression and compression when EC pool used. This might considerably increase CPU usage utilization for recovery process.

Compression Externally at File System - compression to be performed at File Store node by means of underlying file system.
    Pros:
        * Compression is (mostly) transparent to Ceph
        * Clients benefit from cluster CPU resources utilization.
    Cons:
* File system “lock-in”. One can use BTRFS file system only for now. Its production readiness is questionable. * Limited flexibility - compression is a partition/mount point property. Hard to have better granularity - on per-pool or per-object. No way to disable compression. * From cluster perspective compression is performed either on each replicated block or on a block “inflated” by erasure coding. Thus total Ceph cluster CPU utilization to perform compression becomes considerably higher ( three times increase for replicated pools and ~50% one for EC pools).
        * No benefit in reduced data transfers over the net.
* Recovery procedure caused by OSD down triggers complete data set decompression and compression when EC pool used. This might considerably increase CPU usage utilization for recovery process.

Compression Externally at Block Device - compression to be performed at File Store node by means of underlying block device that supports inline data compression.
    Pros:
        * Compression is transparent to Ceph
        * Clients benefit from cluster CPU resources utilization.
    Cons:
        * Production quality solution seems to be absent.
* Limited flexibility - compression is a partition/mount point property. Hard to have better granularity - on per-pool or per-object. No way to disable compression. * From cluster perspective compression is performed either on each replicated block or on a block “inflated” by erasure coding. Thus total Ceph cluster CPU utilization to perform compression becomes considerably higher ( three times increase for replicated pools and ~50% one for EC pools).
        * No benefit in reduced data transfers over the net.
* Recovery procedure caused by OSD down triggers complete data set decompression and compression when EC pool used. This might considerably increase CPU usage utilization for recovery process.

Notes:
Probably the most troublesome issue brought by compression introduction is random write access handling. Please note that Its brief overview is as follows: Compressing entity processes original data blocks for a specific object and eventually saves a set of new compressed blocks to the storage. Since different blocks can have different compression ratio new block are variable in size. When a new write request for specific data range overlapping existing data comes from the client one needs to save resulting compressed block some way. Again due to different compression ratio new block may not fit into the space allocated for the previous one. Moreover if new write request isn’t aligned with the original one we might face the case when previous block is invalidated partially. Thus the flat and sequential object data keeping model doesn’t work any more. Instead one needs to introduce some trick scheme to store, access and overwrite object content. One can find more details on both the issue and potential implementation approach here ( sections I & II):
http://users.ics.forth.gr/~bilas/pdffiles/makatos-snapi10.pdf

Thanks,
Igor.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux