A way to reduce compression overhead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sage, et al.

Let me share some ideas about possible compression burden reduction on the cluster.

As known we perform block compression at BlueStore level for each replica independently. This triples compression CPU overhead for the cluster. Looks like significant CPU resource waste IMHO.

We can probably eliminate this overhead by introduction write request preprocessing performed at ObjectStore level synchronously. This preprocessing parses transaction, detects write requests and transforms them into different ones aligned with current store allocation unit. At the same time resulting extents that span more than single AU are compressed if needed. I.e. preprocessing do some of the job performed at BlueStore::_do_write_data that splits write request into _do_write_small/_do_write_big calls. But after the split and big blob compression preprocessor simply updates the transaction with new write requests.

E.g.

with AU = 0x1000

Write Request (1~0xffff) is transformed into the following sequence:

WriteX 1~0xfff (uncompressed)

WriteX 0x1000~E000 (compressed if needed)

WriteX 0xf000~0xfff (uncompressed)

Then updated transaction is passed to all replicas including the master one using regular apply_/queue_transaction mechanics.


As a bonus one receives automatic payload compression when transporting request to remote store replicas. Regular write request path should be preserved for EC pools and other needs as well.

Please note that almost no latency is introduced for request handling. Replicas receive modified transaction later but they do not spend time on doing split/compress stuff.

There is a potential conflict with the current garbage collection stuff though - we can't perform GC during preprocessing due to possible race with preceding unfinished transactions and consequently we're unable to merge and compress merged data. Well, we can do that when applying transaction but this will produce a sequence like this at each replica:

decompress original request + decompress data to merge -> compress merged data.

Probably this limitation isn't that bad - IMHO it's better to have compressed blobs aligned with original write requests.

Moreover I have some ideas how to get rid of blob_depth notion that makes life a bit easier. Will share shortly.

Any thought/comments?

Thanks,
Igor


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux