Re: Inline dedup/compression

Haomai Wang <haomaiwang@xxxxxxxxx> · Sat, 27 Jun 2015 11:54:44 +0800

On Sat, Jun 27, 2015 at 2:03 AM, James (Fei) Liu-SSI
<james.liu@xxxxxxxxxxxxxxx> wrote:
> Hi Haomai,
>   Thanks for your response as always. I agree compression is comparable easier task but still very challenge in terms of implementation no matter where we should implement . Client side like RBD, or RDBGW or CephFS, or PG should be a little bit better place to implementation in terms of efficiency and cost reduction before the data were duplicated to other OSDs. It has  two reasons :
> 1. Keep the data consistency among OSDs in one PG
> 2. Saving the computing resources
>
> IMHO , The compression should be accomplished before the replication come into play in pool level. However, we can also have second level of compression in the local objectstore.  In term of unit size of compression , It really depends workload and in which layer we should implement.
>
> About inline deduplication, it will dramatically increase the complexities if we bring in the replication and Erasure Coding for consideration.
>
> However, Before we talk about implementation, It would be great if we can understand the pros and cons to implement inline dedupe/compression. We all understand the benefits of dedupe/compression. However, the side effect is performance hurt and need more computing resources. It would be great if we can understand the problems from 30,000 feet high for the whole picture about the Ceph. Please correct me if I were wrong.

Actually we may have some tricks to reduce performance hurt like
compression. As Joe mentioned, we can compress slave pg data to avoid
performance hurt, but it may increase the complexity of recovery and
pg remap things. Another in-detail implement way if we begin to
compress data from messenger, osd thread and pg thread won't access
data for normal client op, so maybe we can make it parallel with pg
process. Journal thread will get the compressed data at last.

The effect of compression also is a concern, we do compression in
rados may not get the best compression result. If we can do
compression in libcephfs, librbd and radosgw and make rados unknown to
compression, it maybe simpler and we can get file/block/object level
compression. it should be better?

About dedup, my current idea is we could setup a memory pool at osd
side for checksum store usage. Then we calculate object data and map
to PG instead of object name at client side, so a object could always
in a osd where it's also responsible for dedup storage. It also could
be distributed at pool level.

>
> By the way, Both of software defined storage solution startups like Hdevig and Springpath provide inline dedupe/compression.  It is not apple to apple comparison. But it is good reference. The datacenters need cost effective solution.
>
> Regards,
> James
>
>
>
> -----Original Message-----
> From: Haomai Wang [mailto:haomaiwang@xxxxxxxxx]
> Sent: Thursday, June 25, 2015 8:08 PM
> To: James (Fei) Liu-SSI
> Cc: ceph-devel
> Subject: Re: Inline dedup/compression
>
> On Fri, Jun 26, 2015 at 6:01 AM, James (Fei) Liu-SSI <james.liu@xxxxxxxxxxxxxxx> wrote:
>> Hi Cephers,
>>     It is not easy to ask when Ceph is going to support inline dedup/compression across OSDs in RADOS because it is not easy task and answered. Ceph is providing replication and EC for performance and failure recovery. But we also lose the efficiency  of storage store and cost associate with it. It is kind of contradicted with each other. But I am curious how other Cephers think about this question.
>>    Any plan for Cephers to do anything regarding to inline dedupe/compression except the features brought by local node itself like BRTFS?
>
> Compression is easier to implement in rados than dedup. The most important thing about compression is where we begin to compress, client, pg or objectstore. Then we need to decide how much the compress unit is. Of course, compress and dedup both like to use keyvalue-alike storage api to use, but I think it's not difficult to use existing objectstore api.
>
> Dedup is more possible to implement in local osd instead of the whole pool or cluster, and if we want to do dedup for the pool level, we need to do dedup from client.
>
>>
>>   Regards,
>>   James
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Best Regards,
>
> Wheat

-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html