RE: ISA erasure code plugin and cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 4 Aug 2014, Andreas Joachim Peters wrote:
> Yes, 
> you are both right, it would have only the current failure situation in 
> each cache and it stores the encoding table in each plug-in instance.
> Probably the sharing of cache and encoding table would add more 
> (unwanted) thread synchronization points.
> 
> If this is the final design, I might remove also the lock? Nevertheless, 
> it has no measurable performance impact ...

Perhaps, but I think the memory impact is significant, especially since 
people will want to use erasure coding for "cold" storage pools are 
underpowered hardware.  I think we should adjust the strategy so that 
there is a single instance per pool, or (perhaps better) the cache is 
global to the plugin (and shared across pools that may share the same 
EC profile).

sage


> Cheers Andreas.
> 
> 
> ________________________________________
> From: Loic Dachary [loic@xxxxxxxxxxx]
> Sent: 04 August 2014 14:56
> To: Ma, Jianpeng; Andreas Joachim Peters
> Cc: Ceph Development
> Subject: Re: ISA erasure code plugin and cache
> 
> On 04/08/2014 14:50, Ma, Jianpeng wrote:
> >> -----Original Message-----
> >> From: ceph-devel-owner@xxxxxxxxxxxxxxx
> >> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary
> >> Sent: Monday, August 4, 2014 8:37 PM
> >> To: Andreas Joachim Peters
> >> Cc: Ma, Jianpeng; Ceph Development
> >> Subject: Re: ISA erasure code plugin and cache
> >>
> >>
> >>
> >> On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic,
> >>>
> >>> the background relevant to your comments have (unfortunately) never been
> >> answered on the mailing list.
> >>>
> >>> The cache is written in a way, that it is useful for a fixed (k,m) combination
> >> and thread-safe.
> >>>
> >>> So, if there is one instance of the plugin per pool, it is the right
> >> implementation. If there is (as you write) one instance for each PG in any pool,
> >> it is sort of 'stupid' because the same encoding table is stored for each PG
> >> seperatly.
> >>
> >> I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs.
> >> Without cache the decode table has to be calculated for each object in the
> >> placement group and there are a lot of objects. The table may be duplicated
> >> hundreds of time so it has an impact on the memory footprint, but it should not
> >> have a visible impact on the decode performances. An optimisation of your
> >> implementation to save memory would be nice, but it is not critical.
> >>
> > AFAIK, for a recovery pg, all the objects of pg have the same lost chunks. So only the first object miss the cache.
> > But the later won't.  It only need a entry to cache.
> > Or am I missing something?
> 
> It also is my understanding and that's what makes the cache so useful. Now, in the long run the cache stays and grows. Since it is a few mega bytes per PG, it will eventually has a non negligible impact on a long running OSD. But again, it's nothing critical to performances.
> 
> Cheers
> 
> >
> > Jianpeng Ma
> >
> >> How large are the decode tables ?
> >>
> >>> So if the final statement is to create one plugin instance per PG, I will change
> >> it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it
> >> can stay.
> >>>
> >>> Just need to know that ... this boils down to the fact, that encoding &
> >> decoding should not be considered 'stateless'.
> >>>
> >>> Cheers Andreas.
> >>>
> >>>
> >>> ________________________________________
> >>> From: Loic Dachary [loic@xxxxxxxxxxx]
> >>> Sent: 04 August 2014 13:56
> >>> To: Andreas Joachim Peters
> >>> Cc: Ma, Jianpeng; Ceph Development
> >>> Subject: ISA erasure code plugin and cache
> >>>
> >>> Hi,
> >>>
> >>> Here is how I understand the current code:
> >>>
> >>> When an OSD is missing, recovery is required and the primary OSD will collect
> >> the available chunks to do so. It will then call the decode method via
> >> ECUtil::decode which is a small wrapper around the corresponding
> >> ErasureCodeInterface::decode method.
> >>>
> >>>    https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361
> >>>
> >>> The ISA plugin will then use the isa_decode method to perform the work
> >>>
> >>>
> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> >> Isa.cc#L212
> >>>
> >>> and will be repeatedly called until all objects in the PGs that were relying on
> >> the missing OSD are recovered. To avoid computing the decoding table for each
> >> object, it is stored in a LRU cache
> >>>
> >>>
> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> >> Isa.cc#L480
> >>>
> >>> and copied in the stack if already there:
> >>>
> >>>
> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> >> Isa.cc#L433
> >>>
> >>> Each PG has a separate instance of ErasureCodeIsa, obtained when it is
> >> created:
> >>>
> >>>    https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292
> >>>
> >>> It means that data members of each ErasureCodeIsa are copied as many
> >> times as there are PGs. If an OSD handles participates in 200 PG that belong to
> >> an erasure coded pool configured to use ErasureCodeIsa, the data members
> >> will be duplicated 200 times.
> >>>
> >>> It is good practice to make it so that the encode/decode methods of
> >> ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no
> >> side effect on the object. In the isa plugin the LRU cache storing the decode
> >> tables is modified by the decode method and guarded by a mutex:
> >>>
> >>>    get
> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> >> Isa.cc#L281
> >>>    put
> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> >> Isa.cc#L310
> >>>
> >>> Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to
> >> find problems, I just wanted to make sure I get the intention before doing so.
> >>>
> >>> Cheers
> >>> --
> >>> Lo?c Dachary, Artisan Logiciel Libre
> >>>
> >>
> >> --
> >> Lo?c Dachary, Artisan Logiciel Libre
> >
> 
> --
> Lo?c Dachary, Artisan Logiciel Libre
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux