Re: Adding Data-At-Rest compression support to Ceph

Igor Fedotov <ifedotov@xxxxxxxxxxxx> · Thu, 24 Sep 2015 19:00:38 +0300

As for me that's the first time I hear about it.

But if we introduce pluggable compression back-ends that would be pretty 
easy to try.

Thanks,
Igor.

On 24.09.2015 18:41, HEWLETT, Paul (Paul) wrote:
Out of curiosity have you considered the Google compression algos:

http://google-opensource.blogspot.co.uk/2015/09/introducing-brotli-new-comp
ression.html

Paul

On 24/09/2015 16:34, "ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of Sage
Weil" <ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of sweil@xxxxxxxxxx>
wrote:

On Thu, 24 Sep 2015, Igor Fedotov wrote:
On 23.09.2015 21:03, Gregory Farnum wrote:
On Wed, Sep 23, 2015 at 6:15 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
The idea of making the primary responsible for object
compression
really concerns me. It means for instance that a single random
access
will likely require access to multiple objects, and breaks many
of the
optimizations we have right now or in the pipeline (for
instance:
direct client access).
Could you please elaborate why multiple objects access is required
on
single
random access?
It sounds to me like you were planning to take an incoming object
write, compress it, and then chunk it. If you do that, the symbols
("abcdefgh = a", "ijklmnop = b", etc) for the compression are likely
to reside in the first object and need to be fetched for each read in
other objects.
Gregory,
do you mean a kind of compressor dictionary under symbols "abcdefgh =
a", etc
here.
And your assumption is that such dictionary is made on the first write,
saved
and reused by any subsequent reads, right?
I think that's not the case - it's better to compress each write
independently.  Thus there is no need to access "dictionary" object (
i.e. the
first object with these symbols) on every read operation,. The latter
uses
compressed block data only.
Yes, this might affect total compression ratio but thinks that's
acceptabl.
I was also assuming each stripe unit would be independently compressed,
but I didn't think about the efficiency.  This approach implies that
you'd want a relatively large stripe size (100s of KB or more).

Hmm, a quick google search suggests the zlib compression window is only
32KB anyway, which isn't so big.  The more aggressive algorithms probably
aren't what people would reach for anyway for CPU utilization reasons...
I
guess?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html