Re: Adding Data-At-Rest compression support to Ceph

"HEWLETT, Paul (Paul)" <paul.hewlett@xxxxxxxxxxxxxxxxxx> · Thu, 24 Sep 2015 15:41:25 +0000

Out of curiosity have you considered the Google compression algos:

http://google-opensource.blogspot.co.uk/2015/09/introducing-brotli-new-comp
ression.html

Paul

On 24/09/2015 16:34, "ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of Sage
Weil" <ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of sweil@xxxxxxxxxx>
wrote:

>On Thu, 24 Sep 2015, Igor Fedotov wrote:
>> On 23.09.2015 21:03, Gregory Farnum wrote:
>> > On Wed, Sep 23, 2015 at 6:15 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>> > > > > 
>> > > > > The idea of making the primary responsible for object
>>compression
>> > > > > really concerns me. It means for instance that a single random
>>access
>> > > > > will likely require access to multiple objects, and breaks many
>>of the
>> > > > > optimizations we have right now or in the pipeline (for
>>instance:
>> > > > > direct client access).
>> > > Could you please elaborate why multiple objects access is required
>>on
>> > > single
>> > > random access?
>> > It sounds to me like you were planning to take an incoming object
>> > write, compress it, and then chunk it. If you do that, the symbols
>> > ("abcdefgh = a", "ijklmnop = b", etc) for the compression are likely
>> > to reside in the first object and need to be fetched for each read in
>> > other objects.
>> Gregory,
>> do you mean a kind of compressor dictionary under symbols "abcdefgh =
>>a", etc
>> here.
>> And your assumption is that such dictionary is made on the first write,
>>saved
>> and reused by any subsequent reads, right?
>> I think that's not the case - it's better to compress each write
>> independently.  Thus there is no need to access "dictionary" object (
>>i.e. the
>> first object with these symbols) on every read operation,. The latter
>>uses
>> compressed block data only.
>> Yes, this might affect total compression ratio but thinks that's
>>acceptabl.
>
>I was also assuming each stripe unit would be independently compressed,
>but I didn't think about the efficiency.  This approach implies that
>you'd want a relatively large stripe size (100s of KB or more).
>
>Hmm, a quick google search suggests the zlib compression window is only
>32KB anyway, which isn't so big.  The more aggressive algorithms probably
>aren't what people would reach for anyway for CPU utilization reasons...
>I 
>guess?
>
>sage
>--
>To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>the body of a message to majordomo@xxxxxxxxxxxxxxx
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html