Hi Andreas, On 04/08/2014 22:07, Andreas Joachim Peters wrote: > Hi Loic, > brilliant! > > That's also a good compromise to share encoding tables and avoid locking for encoding. I'm glad you like it ;-) > If there is no upstream change envisaged, I will prepare a pull request according to your proposal. The decoding cache can be shared for each technique, k+m are intrinsic in the key's used for lookup e.g. we would have two LRU caches if both techniques are used and one encoding table for each (k+m) value in use. > The current setting let's one LRU cache grow to a maximum of 15 MB, however I would consider such an event as 'extremly' rare ... it is mainly triggered artificially using the benchmark tool. I'll work on adapting the isa plugin to use the common ErasureCode class but that should have little impact on what you'll do. Feel free to ignore this and proceed with the implementation. I'll rebase if necessary once you're done. Cheers > Cheers Andreas. > > ________________________________________ > From: Loic Dachary [loic@xxxxxxxxxxx] > Sent: 04 August 2014 19:04 > To: Andreas Joachim Peters > Cc: Ceph Development > Subject: Re: ISA erasure code plugin and cache > > Hi Andreas, > > What about getting a singleton containing the LRU cache in > > https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L48 > > and making it a parameter to the constructor > > https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L56 > > or the init method > > https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L70 > > I assume LRU cache can only be shared if the parameters are the same because it would not make sense to share the cache of an instance using the Cauchy technique with the cache of an instance using the Vandermonde technique. > > If the singleton is created in the plugin entry point: > > https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L77 > > there is no need to protect it against race conditions because the call to the entry point already is guarded: > > https://github.com/ceph/ceph/blob/master/src/erasure-code/ErasureCodePlugin.cc#L74 > > Cheers > > On 04/08/2014 17:22, Andreas Joachim Peters wrote: >> Could one not attach the EC instance to the "pg_pool_t" structure which is given by reference to PGBackend *PGBackend::build_pg_backend? Just a naive question ... i don't know the upstream code ... >> >> That would follow the spirit of 'saving/sharing resources' and would follow the logic that the CODEC configuration is by pool and not by PG ?!?!? >> >> Cheers Andreas. >> >> >> >> ________________________________________ >> From: ceph-devel-owner@xxxxxxxxxxxxxxx [ceph-devel-owner@xxxxxxxxxxxxxxx] on behalf of Sage Weil [sweil@xxxxxxxxxx] >> Sent: 04 August 2014 16:22 >> To: Andreas Joachim Peters >> Cc: Loic Dachary; Ma, Jianpeng; Ceph Development >> Subject: RE: ISA erasure code plugin and cache >> >> On Mon, 4 Aug 2014, Andreas Joachim Peters wrote: >>> Yes, >>> you are both right, it would have only the current failure situation in >>> each cache and it stores the encoding table in each plug-in instance. >>> Probably the sharing of cache and encoding table would add more >>> (unwanted) thread synchronization points. >>> >>> If this is the final design, I might remove also the lock? Nevertheless, >>> it has no measurable performance impact ... >> >> Perhaps, but I think the memory impact is significant, especially since >> people will want to use erasure coding for "cold" storage pools are >> underpowered hardware. I think we should adjust the strategy so that >> there is a single instance per pool, or (perhaps better) the cache is >> global to the plugin (and shared across pools that may share the same >> EC profile). >> >> sage >> >> >>> Cheers Andreas. >>> >>> >>> ________________________________________ >>> From: Loic Dachary [loic@xxxxxxxxxxx] >>> Sent: 04 August 2014 14:56 >>> To: Ma, Jianpeng; Andreas Joachim Peters >>> Cc: Ceph Development >>> Subject: Re: ISA erasure code plugin and cache >>> >>> On 04/08/2014 14:50, Ma, Jianpeng wrote: >>>>> -----Original Message----- >>>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx >>>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary >>>>> Sent: Monday, August 4, 2014 8:37 PM >>>>> To: Andreas Joachim Peters >>>>> Cc: Ma, Jianpeng; Ceph Development >>>>> Subject: Re: ISA erasure code plugin and cache >>>>> >>>>> >>>>> >>>>> On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic, >>>>>> >>>>>> the background relevant to your comments have (unfortunately) never been >>>>> answered on the mailing list. >>>>>> >>>>>> The cache is written in a way, that it is useful for a fixed (k,m) combination >>>>> and thread-safe. >>>>>> >>>>>> So, if there is one instance of the plugin per pool, it is the right >>>>> implementation. If there is (as you write) one instance for each PG in any pool, >>>>> it is sort of 'stupid' because the same encoding table is stored for each PG >>>>> seperatly. >>>>> >>>>> I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs. >>>>> Without cache the decode table has to be calculated for each object in the >>>>> placement group and there are a lot of objects. The table may be duplicated >>>>> hundreds of time so it has an impact on the memory footprint, but it should not >>>>> have a visible impact on the decode performances. An optimisation of your >>>>> implementation to save memory would be nice, but it is not critical. >>>>> >>>> AFAIK, for a recovery pg, all the objects of pg have the same lost chunks. So only the first object miss the cache. >>>> But the later won't. It only need a entry to cache. >>>> Or am I missing something? >>> >>> It also is my understanding and that's what makes the cache so useful. Now, in the long run the cache stays and grows. Since it is a few mega bytes per PG, it will eventually has a non negligible impact on a long running OSD. But again, it's nothing critical to performances. >>> >>> Cheers >>> >>>> >>>> Jianpeng Ma >>>> >>>>> How large are the decode tables ? >>>>> >>>>>> So if the final statement is to create one plugin instance per PG, I will change >>>>> it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it >>>>> can stay. >>>>>> >>>>>> Just need to know that ... this boils down to the fact, that encoding & >>>>> decoding should not be considered 'stateless'. >>>>>> >>>>>> Cheers Andreas. >>>>>> >>>>>> >>>>>> ________________________________________ >>>>>> From: Loic Dachary [loic@xxxxxxxxxxx] >>>>>> Sent: 04 August 2014 13:56 >>>>>> To: Andreas Joachim Peters >>>>>> Cc: Ma, Jianpeng; Ceph Development >>>>>> Subject: ISA erasure code plugin and cache >>>>>> >>>>>> Hi, >>>>>> >>>>>> Here is how I understand the current code: >>>>>> >>>>>> When an OSD is missing, recovery is required and the primary OSD will collect >>>>> the available chunks to do so. It will then call the decode method via >>>>> ECUtil::decode which is a small wrapper around the corresponding >>>>> ErasureCodeInterface::decode method. >>>>>> >>>>>> https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361 >>>>>> >>>>>> The ISA plugin will then use the isa_decode method to perform the work >>>>>> >>>>>> >>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode >>>>> Isa.cc#L212 >>>>>> >>>>>> and will be repeatedly called until all objects in the PGs that were relying on >>>>> the missing OSD are recovered. To avoid computing the decoding table for each >>>>> object, it is stored in a LRU cache >>>>>> >>>>>> >>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode >>>>> Isa.cc#L480 >>>>>> >>>>>> and copied in the stack if already there: >>>>>> >>>>>> >>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode >>>>> Isa.cc#L433 >>>>>> >>>>>> Each PG has a separate instance of ErasureCodeIsa, obtained when it is >>>>> created: >>>>>> >>>>>> https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292 >>>>>> >>>>>> It means that data members of each ErasureCodeIsa are copied as many >>>>> times as there are PGs. If an OSD handles participates in 200 PG that belong to >>>>> an erasure coded pool configured to use ErasureCodeIsa, the data members >>>>> will be duplicated 200 times. >>>>>> >>>>>> It is good practice to make it so that the encode/decode methods of >>>>> ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no >>>>> side effect on the object. In the isa plugin the LRU cache storing the decode >>>>> tables is modified by the decode method and guarded by a mutex: >>>>>> >>>>>> get >>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode >>>>> Isa.cc#L281 >>>>>> put >>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode >>>>> Isa.cc#L310 >>>>>> >>>>>> Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to >>>>> find problems, I just wanted to make sure I get the intention before doing so. >>>>>> >>>>>> Cheers >>>>>> -- >>>>>> Lo?c Dachary, Artisan Logiciel Libre >>>>>> >>>>> >>>>> -- >>>>> Lo?c Dachary, Artisan Logiciel Libre >>>> >>> >>> -- >>> Lo?c Dachary, Artisan Logiciel Libre >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html-- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- > Loïc Dachary, Artisan Logiciel Libre > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Loïc Dachary, Artisan Logiciel Libre
Attachment:
signature.asc
Description: OpenPGP digital signature