> -----Original Message----- > From: ceph-devel-owner@xxxxxxxxxxxxxxx > [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary > Sent: Monday, August 4, 2014 8:37 PM > To: Andreas Joachim Peters > Cc: Ma, Jianpeng; Ceph Development > Subject: Re: ISA erasure code plugin and cache > > > > On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic, > > > > the background relevant to your comments have (unfortunately) never been > answered on the mailing list. > > > > The cache is written in a way, that it is useful for a fixed (k,m) combination > and thread-safe. > > > > So, if there is one instance of the plugin per pool, it is the right > implementation. If there is (as you write) one instance for each PG in any pool, > it is sort of 'stupid' because the same encoding table is stored for each PG > seperatly. > > I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs. > Without cache the decode table has to be calculated for each object in the > placement group and there are a lot of objects. The table may be duplicated > hundreds of time so it has an impact on the memory footprint, but it should not > have a visible impact on the decode performances. An optimisation of your > implementation to save memory would be nice, but it is not critical. > AFAIK, for a recovery pg, all the objects of pg have the same lost chunks. So only the first object miss the cache. But the later won't. It only need a entry to cache. Or am I missing something? Jianpeng Ma > How large are the decode tables ? > > > So if the final statement is to create one plugin instance per PG, I will change > it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it > can stay. > > > > Just need to know that ... this boils down to the fact, that encoding & > decoding should not be considered 'stateless'. > > > > Cheers Andreas. > > > > > > ________________________________________ > > From: Loic Dachary [loic@xxxxxxxxxxx] > > Sent: 04 August 2014 13:56 > > To: Andreas Joachim Peters > > Cc: Ma, Jianpeng; Ceph Development > > Subject: ISA erasure code plugin and cache > > > > Hi, > > > > Here is how I understand the current code: > > > > When an OSD is missing, recovery is required and the primary OSD will collect > the available chunks to do so. It will then call the decode method via > ECUtil::decode which is a small wrapper around the corresponding > ErasureCodeInterface::decode method. > > > > https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361 > > > > The ISA plugin will then use the isa_decode method to perform the work > > > > > https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode > Isa.cc#L212 > > > > and will be repeatedly called until all objects in the PGs that were relying on > the missing OSD are recovered. To avoid computing the decoding table for each > object, it is stored in a LRU cache > > > > > https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode > Isa.cc#L480 > > > > and copied in the stack if already there: > > > > > https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode > Isa.cc#L433 > > > > Each PG has a separate instance of ErasureCodeIsa, obtained when it is > created: > > > > https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292 > > > > It means that data members of each ErasureCodeIsa are copied as many > times as there are PGs. If an OSD handles participates in 200 PG that belong to > an erasure coded pool configured to use ErasureCodeIsa, the data members > will be duplicated 200 times. > > > > It is good practice to make it so that the encode/decode methods of > ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no > side effect on the object. In the isa plugin the LRU cache storing the decode > tables is modified by the decode method and guarded by a mutex: > > > > get > https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode > Isa.cc#L281 > > put > https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode > Isa.cc#L310 > > > > Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to > find problems, I just wanted to make sure I get the intention before doing so. > > > > Cheers > > -- > > Loïc Dachary, Artisan Logiciel Libre > > > > -- > Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html