Problem with customized crush rule for EC pool

leidong@xxxxxxxxxxxxx (Lei Dong) · Wed, 10 Sep 2014 02:29:16 +0000

Yes, My goal is to make it loosing 3 OSD does not lose data.

My 6 racks may not be in different rooms but they use 6 different
switches, so I want when any switch is down or unreachable, my data can
still be accessed. I think it?s not an unrealistic requirement.

Thanks!

LeiDong.

On 9/9/14, 10:02 PM, "Loic Dachary" <loic at dachary.org> wrote:

>
>
>On 09/09/2014 14:21, Lei Dong wrote:
>> Thanks loic!
>> 
>> Actually I've found that increase choose_local_fallback_tries can
>>help(chooseleaf_tries helps not so significantly), but I'm afraid when
>>osd failure happen and need to find new acting set, it may be fail to
>>find enough racks again. So I'm trying to find a more guaranteed way in
>>case of osd failure.
>> 
>> My profile is nothing special other than k=8 m=3.
>
>So your goal is to make it so loosing 3 OSD simultaneously does not mean
>loosing data. By forcing each rack to hold at most 2 OSDs for a given
>object, you make it so loosing a full rack does not mean loosing data.
>Are these racks in the same room in the datacenter ? In the event of a
>catastrophic failure that permanently destroy one rack, how realistic is
>it that the other racks are unharmed ? If the rack is destroyed by fire
>and is in a row with the six other racks, there is a very high chance
>that the other racks will also be damaged. Note that I am not a system
>architect nor a system administrator : I may be completely wrong ;-) If
>it turns out that the probability of a single rack to fail entirely and
>independently of the others is negligible, it may not be necessary to
>make a complex ruleset and instead use the default ruleset.
>
>My 2cts
> 
>> 
>> Thanks again!
>> 
>> Leidong
>> 
>> 
>> 
>> 
>> 
>>> On 2014?9?9?, at ??7:53, "Loic Dachary" <loic at dachary.org> wrote:
>>>
>>> Hi,
>>>
>>> It is indeed possible that mapping fails if there are just enough
>>>racks to match the constraint. And the probability of a bad mapping
>>>increases when the number of PG increases because there is a need for
>>>more mapping. You can tell crush to try harder with
>>>
>>> step set_chooseleaf_tries 10
>>>
>>> Be careful though : increasing this number will change mapping. It
>>>will not just fix the bad mappings you're seeing, it will also change
>>>the mappings that succeeded with a lower value. Once you've set this
>>>parameter, it cannot be modified.
>>>
>>> Would you mind sharing the erasure code profile you plan to work with ?
>>>
>>> Cheers
>>>
>>>> On 09/09/2014 12:39, Lei Dong wrote:
>>>> Hi ceph users:
>>>>
>>>> I want to create a customized crush rule for my EC pool (with
>>>>replica_size = 11) to distribute replicas into 6 different Racks.
>>>>
>>>> I use the following rule at first:
>>>>
>>>> Step take default  // root
>>>> Step choose firstn 6 type rack// 6 racks, I have and only have 6 racks
>>>> Step chooseleaf indep 2 type osd // 2 osds per rack
>>>> Step emit
>>>>
>>>> I looks fine and works fine when PG num is small.
>>>> But when pg num increase, there are always some PGs which can not
>>>>take all the 6 racks.
>>>> It looks like ?Step choose firstn 6 type rack? sometimes returns only
>>>>5 racks.
>>>> After some investigation,  I think it may caused by collision of
>>>>choices.
>>>>
>>>> Then I come up with another solution to solve collision like this:
>>>>
>>>> Step take rack0
>>>> Step chooseleaf indep 2 type osd
>>>> Step emit
>>>> Step take rack1
>>>> ?.
>>>> (manually take every rack)
>>>>
>>>> This won?t cause rack collision, because I specify rack by name at
>>>>first. But the problem is that osd in rack0 will always be primary osd
>>>>because I choose from rack0 first.
>>>>
>>>> So the question is what is the recommended way to meet such a need
>>>>(distribute 11 replicas into 6 racks evenly in case of rack failure)?
>>>>
>>>>
>>>> Thanks!
>>>> LeiDong
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users at lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> -- 
>>> Lo?c Dachary, Artisan Logiciel Libre
>>>
>
>-- 
>Lo?c Dachary, Artisan Logiciel Libre
>