Re: Inline dedup/compression

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 29 Jun 2015 12:01:01 +0100

We discuss this periodically but not in any great depth. Compression
and dedupe are both best performed at a single point with some sort of
global knowledge, which is very antithetical to Ceph's design.
Blue-sky discussions for dedupe generally center around trying out
some kind of CAS system with redirects from named objects that are
just indexes of CAS-addressed objects which store the actual object
data, but that introduces the redirect latency and I'm not
super-confident that we'd get much saving outside of scenarios like
RBD where we already capture most of the benefit with the gold master
"parent" images. :/

Compression we discuss less often — you can do it:
a) on the level of an OSD, in which case why would we bother
implementing it ourselves instead of just stacking on top of some
other compression system?
b) on the next level of clients (RBD, RGW, CephFS) ahead of the RADOS
object transition. But that means you generally can't do stuff like
reading portions of an object to satisfy partial reads, and ideas like
striping strategies stop making much sense.
-Greg

On Fri, Jun 26, 2015 at 7:03 PM, James (Fei) Liu-SSI
<james.liu@xxxxxxxxxxxxxxx> wrote:
> Hi Haomai,
>   Thanks for your response as always. I agree compression is comparable easier task but still very challenge in terms of implementation no matter where we should implement . Client side like RBD, or RDBGW or CephFS, or PG should be a little bit better place to implementation in terms of efficiency and cost reduction before the data were duplicated to other OSDs. It has  two reasons :
> 1. Keep the data consistency among OSDs in one PG
> 2. Saving the computing resources
>
> IMHO , The compression should be accomplished before the replication come into play in pool level. However, we can also have second level of compression in the local objectstore.  In term of unit size of compression , It really depends workload and in which layer we should implement.
>
> About inline deduplication, it will dramatically increase the complexities if we bring in the replication and Erasure Coding for consideration.
>
> However, Before we talk about implementation, It would be great if we can understand the pros and cons to implement inline dedupe/compression. We all understand the benefits of dedupe/compression. However, the side effect is performance hurt and need more computing resources. It would be great if we can understand the problems from 30,000 feet high for the whole picture about the Ceph. Please correct me if I were wrong.
>
> By the way, Both of software defined storage solution startups like Hdevig and Springpath provide inline dedupe/compression.  It is not apple to apple comparison. But it is good reference. The datacenters need cost effective solution.
>
> Regards,
> James
>
>
>
> -----Original Message-----
> From: Haomai Wang [mailto:haomaiwang@xxxxxxxxx]
> Sent: Thursday, June 25, 2015 8:08 PM
> To: James (Fei) Liu-SSI
> Cc: ceph-devel
> Subject: Re: Inline dedup/compression
>
> On Fri, Jun 26, 2015 at 6:01 AM, James (Fei) Liu-SSI <james.liu@xxxxxxxxxxxxxxx> wrote:
>> Hi Cephers,
>>     It is not easy to ask when Ceph is going to support inline dedup/compression across OSDs in RADOS because it is not easy task and answered. Ceph is providing replication and EC for performance and failure recovery. But we also lose the efficiency  of storage store and cost associate with it. It is kind of contradicted with each other. But I am curious how other Cephers think about this question.
>>    Any plan for Cephers to do anything regarding to inline dedupe/compression except the features brought by local node itself like BRTFS?
>
> Compression is easier to implement in rados than dedup. The most important thing about compression is where we begin to compress, client, pg or objectstore. Then we need to decide how much the compress unit is. Of course, compress and dedup both like to use keyvalue-alike storage api to use, but I think it's not difficult to use existing objectstore api.
>
> Dedup is more possible to implement in local osd instead of the whole pool or cluster, and if we want to do dedup for the pool level, we need to do dedup from client.
>
>>
>>   Regards,
>>   James
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Best Regards,
>
> Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html