Re: Inline dedup/compression

Ning Yao <zay11022@xxxxxxxxx> · Wed, 1 Jul 2015 21:46:42 +0800



For compression, I prefer to implement it in ECpool, it is much easier
because objects in ECpool are already striped, which is what we have
already finished now(and in testing). And the only Append write
operation is allowed in EC, which is also lead us to implement it
conveniently. Moreover, as is mentioned by Haomai, it will induce
large read, write penalty and the data becomes fragment if compression
is used in Replicated Pool. Actually, the purpose we do compression is
to save storage and bandwidth, so the pros and cons is more related to
what service you provides. Like VDI case, which includes lots small
read_write, it is not a smart decision to do compression in Replicated
Pool so that we just apply the compression in EC. I think kv-store,
like rocksdb and leveldb, is much suitable for Replicated Pool if
compression is need to be done.
Implementation like:
Write:
      1) ECpool  copy_from() object  from HotPool
      2) Compress data by stripe and calculate hash_info for object as
well as compress_info (which maintains (off,lens) pair for compressed
object corresponding to the content for original object. And some
other info also need like compress alg and so on)
      3) encode compress_info into bufferlist and treat it as a
setattr transaction
READ:
may proxy read or promotion:
      Promotion:
          copy from whole compress object with compress_info to
ReplicatedPG and submitted promotion_write Transaction (just like
normal write Transaction, but need decompression before write to
FileStore)
      proxy read:
          may return compress_info and  read content to Replicated
Pool and decompress to content. Select out the required data and send
back to client.


Glad to hear a new idea to do the pool based dedup from Chaitanya. It
seems that the thought is to maintain the large number of manifest
objects just like current rados objects and distribute among all osds.
Am I right? It, though,  increases the request path, but reduce the
complexity. If introducing a centralized memory index server cluster,
it may become too complex and hard to maintain and scale. Finally, it
will become the second mds and cannot be done well with huge
complexity.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html