Out of curiosity have you considered the Google compression algos: http://google-opensource.blogspot.co.uk/2015/09/introducing-brotli-new-comp ression.html Paul On 24/09/2015 16:34, "ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of Sage Weil" <ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of sweil@xxxxxxxxxx> wrote: >On Thu, 24 Sep 2015, Igor Fedotov wrote: >> On 23.09.2015 21:03, Gregory Farnum wrote: >> > On Wed, Sep 23, 2015 at 6:15 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> > > > > >> > > > > The idea of making the primary responsible for object >>compression >> > > > > really concerns me. It means for instance that a single random >>access >> > > > > will likely require access to multiple objects, and breaks many >>of the >> > > > > optimizations we have right now or in the pipeline (for >>instance: >> > > > > direct client access). >> > > Could you please elaborate why multiple objects access is required >>on >> > > single >> > > random access? >> > It sounds to me like you were planning to take an incoming object >> > write, compress it, and then chunk it. If you do that, the symbols >> > ("abcdefgh = a", "ijklmnop = b", etc) for the compression are likely >> > to reside in the first object and need to be fetched for each read in >> > other objects. >> Gregory, >> do you mean a kind of compressor dictionary under symbols "abcdefgh = >>a", etc >> here. >> And your assumption is that such dictionary is made on the first write, >>saved >> and reused by any subsequent reads, right? >> I think that's not the case - it's better to compress each write >> independently. Thus there is no need to access "dictionary" object ( >>i.e. the >> first object with these symbols) on every read operation,. The latter >>uses >> compressed block data only. >> Yes, this might affect total compression ratio but thinks that's >>acceptabl. > >I was also assuming each stripe unit would be independently compressed, >but I didn't think about the efficiency. This approach implies that >you'd want a relatively large stripe size (100s of KB or more). > >Hmm, a quick google search suggests the zlib compression window is only >32KB anyway, which isn't so big. The more aggressive algorithms probably >aren't what people would reach for anyway for CPU utilization reasons... >I >guess? > >sage >-- >To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >the body of a message to majordomo@xxxxxxxxxxxxxxx >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html