Yes, you are both right, it would have only the current failure situation in each cache and it stores the encoding table in each plug-in instance. Probably the sharing of cache and encoding table would add more (unwanted) thread synchronization points. If this is the final design, I might remove also the lock? Nevertheless, it has no measurable performance impact ... Cheers Andreas. ________________________________________ From: Loic Dachary [loic@xxxxxxxxxxx] Sent: 04 August 2014 14:56 To: Ma, Jianpeng; Andreas Joachim Peters Cc: Ceph Development Subject: Re: ISA erasure code plugin and cache On 04/08/2014 14:50, Ma, Jianpeng wrote: >> -----Original Message----- >> From: ceph-devel-owner@xxxxxxxxxxxxxxx >> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary >> Sent: Monday, August 4, 2014 8:37 PM >> To: Andreas Joachim Peters >> Cc: Ma, Jianpeng; Ceph Development >> Subject: Re: ISA erasure code plugin and cache >> >> >> >> On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic, >>> >>> the background relevant to your comments have (unfortunately) never been >> answered on the mailing list. >>> >>> The cache is written in a way, that it is useful for a fixed (k,m) combination >> and thread-safe. >>> >>> So, if there is one instance of the plugin per pool, it is the right >> implementation. If there is (as you write) one instance for each PG in any pool, >> it is sort of 'stupid' because the same encoding table is stored for each PG >> seperatly. >> >> I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs. >> Without cache the decode table has to be calculated for each object in the >> placement group and there are a lot of objects. The table may be duplicated >> hundreds of time so it has an impact on the memory footprint, but it should not >> have a visible impact on the decode performances. An optimisation of your >> implementation to save memory would be nice, but it is not critical. >> > AFAIK, for a recovery pg, all the objects of pg have the same lost chunks. So only the first object miss the cache. > But the later won't. It only need a entry to cache. > Or am I missing something? It also is my understanding and that's what makes the cache so useful. Now, in the long run the cache stays and grows. Since it is a few mega bytes per PG, it will eventually has a non negligible impact on a long running OSD. But again, it's nothing critical to performances. Cheers > > Jianpeng Ma > >> How large are the decode tables ? >> >>> So if the final statement is to create one plugin instance per PG, I will change >> it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it >> can stay. >>> >>> Just need to know that ... this boils down to the fact, that encoding & >> decoding should not be considered 'stateless'. >>> >>> Cheers Andreas. >>> >>> >>> ________________________________________ >>> From: Loic Dachary [loic@xxxxxxxxxxx] >>> Sent: 04 August 2014 13:56 >>> To: Andreas Joachim Peters >>> Cc: Ma, Jianpeng; Ceph Development >>> Subject: ISA erasure code plugin and cache >>> >>> Hi, >>> >>> Here is how I understand the current code: >>> >>> When an OSD is missing, recovery is required and the primary OSD will collect >> the available chunks to do so. It will then call the decode method via >> ECUtil::decode which is a small wrapper around the corresponding >> ErasureCodeInterface::decode method. >>> >>> https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361 >>> >>> The ISA plugin will then use the isa_decode method to perform the work >>> >>> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode >> Isa.cc#L212 >>> >>> and will be repeatedly called until all objects in the PGs that were relying on >> the missing OSD are recovered. To avoid computing the decoding table for each >> object, it is stored in a LRU cache >>> >>> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode >> Isa.cc#L480 >>> >>> and copied in the stack if already there: >>> >>> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode >> Isa.cc#L433 >>> >>> Each PG has a separate instance of ErasureCodeIsa, obtained when it is >> created: >>> >>> https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292 >>> >>> It means that data members of each ErasureCodeIsa are copied as many >> times as there are PGs. If an OSD handles participates in 200 PG that belong to >> an erasure coded pool configured to use ErasureCodeIsa, the data members >> will be duplicated 200 times. >>> >>> It is good practice to make it so that the encode/decode methods of >> ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no >> side effect on the object. In the isa plugin the LRU cache storing the decode >> tables is modified by the decode method and guarded by a mutex: >>> >>> get >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode >> Isa.cc#L281 >>> put >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode >> Isa.cc#L310 >>> >>> Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to >> find problems, I just wanted to make sure I get the intention before doing so. >>> >>> Cheers >>> -- >>> Loïc Dachary, Artisan Logiciel Libre >>> >> >> -- >> Loïc Dachary, Artisan Logiciel Libre > -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html