RE: ISA erasure code plugin and cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, 
you are both right, it would have only the current failure situation in each cache and it stores the encoding table in each plug-in instance.
Probably the sharing of cache and encoding table would add more (unwanted) thread synchronization points. 

If this is the final design, I might remove also the lock? Nevertheless, it has no measurable performance impact ...

Cheers Andreas.


________________________________________
From: Loic Dachary [loic@xxxxxxxxxxx]
Sent: 04 August 2014 14:56
To: Ma, Jianpeng; Andreas Joachim Peters
Cc: Ceph Development
Subject: Re: ISA erasure code plugin and cache

On 04/08/2014 14:50, Ma, Jianpeng wrote:
>> -----Original Message-----
>> From: ceph-devel-owner@xxxxxxxxxxxxxxx
>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary
>> Sent: Monday, August 4, 2014 8:37 PM
>> To: Andreas Joachim Peters
>> Cc: Ma, Jianpeng; Ceph Development
>> Subject: Re: ISA erasure code plugin and cache
>>
>>
>>
>> On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic,
>>>
>>> the background relevant to your comments have (unfortunately) never been
>> answered on the mailing list.
>>>
>>> The cache is written in a way, that it is useful for a fixed (k,m) combination
>> and thread-safe.
>>>
>>> So, if there is one instance of the plugin per pool, it is the right
>> implementation. If there is (as you write) one instance for each PG in any pool,
>> it is sort of 'stupid' because the same encoding table is stored for each PG
>> seperatly.
>>
>> I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs.
>> Without cache the decode table has to be calculated for each object in the
>> placement group and there are a lot of objects. The table may be duplicated
>> hundreds of time so it has an impact on the memory footprint, but it should not
>> have a visible impact on the decode performances. An optimisation of your
>> implementation to save memory would be nice, but it is not critical.
>>
> AFAIK, for a recovery pg, all the objects of pg have the same lost chunks. So only the first object miss the cache.
> But the later won't.  It only need a entry to cache.
> Or am I missing something?

It also is my understanding and that's what makes the cache so useful. Now, in the long run the cache stays and grows. Since it is a few mega bytes per PG, it will eventually has a non negligible impact on a long running OSD. But again, it's nothing critical to performances.

Cheers

>
> Jianpeng Ma
>
>> How large are the decode tables ?
>>
>>> So if the final statement is to create one plugin instance per PG, I will change
>> it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it
>> can stay.
>>>
>>> Just need to know that ... this boils down to the fact, that encoding &
>> decoding should not be considered 'stateless'.
>>>
>>> Cheers Andreas.
>>>
>>>
>>> ________________________________________
>>> From: Loic Dachary [loic@xxxxxxxxxxx]
>>> Sent: 04 August 2014 13:56
>>> To: Andreas Joachim Peters
>>> Cc: Ma, Jianpeng; Ceph Development
>>> Subject: ISA erasure code plugin and cache
>>>
>>> Hi,
>>>
>>> Here is how I understand the current code:
>>>
>>> When an OSD is missing, recovery is required and the primary OSD will collect
>> the available chunks to do so. It will then call the decode method via
>> ECUtil::decode which is a small wrapper around the corresponding
>> ErasureCodeInterface::decode method.
>>>
>>>    https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361
>>>
>>> The ISA plugin will then use the isa_decode method to perform the work
>>>
>>>
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L212
>>>
>>> and will be repeatedly called until all objects in the PGs that were relying on
>> the missing OSD are recovered. To avoid computing the decoding table for each
>> object, it is stored in a LRU cache
>>>
>>>
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L480
>>>
>>> and copied in the stack if already there:
>>>
>>>
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L433
>>>
>>> Each PG has a separate instance of ErasureCodeIsa, obtained when it is
>> created:
>>>
>>>    https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292
>>>
>>> It means that data members of each ErasureCodeIsa are copied as many
>> times as there are PGs. If an OSD handles participates in 200 PG that belong to
>> an erasure coded pool configured to use ErasureCodeIsa, the data members
>> will be duplicated 200 times.
>>>
>>> It is good practice to make it so that the encode/decode methods of
>> ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no
>> side effect on the object. In the isa plugin the LRU cache storing the decode
>> tables is modified by the decode method and guarded by a mutex:
>>>
>>>    get
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L281
>>>    put
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L310
>>>
>>> Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to
>> find problems, I just wanted to make sure I get the intention before doing so.
>>>
>>> Cheers
>>> --
>>> Loïc Dachary, Artisan Logiciel Libre
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>

--
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux