Problem with customized crush rule for EC pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 09/09/2014 14:21, Lei Dong wrote:
> Thanks loic!
> 
> Actually I've found that increase choose_local_fallback_tries can help(chooseleaf_tries helps not so significantly), but I'm afraid when osd failure happen and need to find new acting set, it may be fail to find enough racks again. So I'm trying to find a more guaranteed way in case of osd failure.
> 
> My profile is nothing special other than k=8 m=3. 

So your goal is to make it so loosing 3 OSD simultaneously does not mean loosing data. By forcing each rack to hold at most 2 OSDs for a given object, you make it so loosing a full rack does not mean loosing data. Are these racks in the same room in the datacenter ? In the event of a catastrophic failure that permanently destroy one rack, how realistic is it that the other racks are unharmed ? If the rack is destroyed by fire and is in a row with the six other racks, there is a very high chance that the other racks will also be damaged. Note that I am not a system architect nor a system administrator : I may be completely wrong ;-) If it turns out that the probability of a single rack to fail entirely and independently of the others is negligible, it may not be necessary to make a complex ruleset and instead use the default ruleset.

My 2cts
 
> 
> Thanks again!
> 
> Leidong
> 
> 
> 
> 
> 
>> On 2014?9?9?, at ??7:53, "Loic Dachary" <loic at dachary.org> wrote:
>>
>> Hi,
>>
>> It is indeed possible that mapping fails if there are just enough racks to match the constraint. And the probability of a bad mapping increases when the number of PG increases because there is a need for more mapping. You can tell crush to try harder with 
>>
>> step set_chooseleaf_tries 10
>>
>> Be careful though : increasing this number will change mapping. It will not just fix the bad mappings you're seeing, it will also change the mappings that succeeded with a lower value. Once you've set this parameter, it cannot be modified.
>>
>> Would you mind sharing the erasure code profile you plan to work with ?
>>
>> Cheers
>>
>>> On 09/09/2014 12:39, Lei Dong wrote:
>>> Hi ceph users:
>>>
>>> I want to create a customized crush rule for my EC pool (with replica_size = 11) to distribute replicas into 6 different Racks. 
>>>
>>> I use the following rule at first:
>>>
>>> Step take default  // root
>>> Step choose firstn 6 type rack// 6 racks, I have and only have 6 racks
>>> Step chooseleaf indep 2 type osd // 2 osds per rack 
>>> Step emit
>>>
>>> I looks fine and works fine when PG num is small. 
>>> But when pg num increase, there are always some PGs which can not take all the 6 racks. 
>>> It looks like ?Step choose firstn 6 type rack? sometimes returns only 5 racks.
>>> After some investigation,  I think it may caused by collision of choices.
>>>
>>> Then I come up with another solution to solve collision like this:
>>>
>>> Step take rack0
>>> Step chooseleaf indep 2 type osd
>>> Step emit
>>> Step take rack1
>>> ?.
>>> (manually take every rack)
>>>
>>> This won?t cause rack collision, because I specify rack by name at first. But the problem is that osd in rack0 will always be primary osd because I choose from rack0 first.
>>>
>>> So the question is what is the recommended way to meet such a need (distribute 11 replicas into 6 racks evenly in case of rack failure)?
>>>
>>>
>>> Thanks!
>>> LeiDong
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> -- 
>> Lo?c Dachary, Artisan Logiciel Libre
>>

-- 
Lo?c Dachary, Artisan Logiciel Libre

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 263 bytes
Desc: OpenPGP digital signature
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140909/18433fa9/attachment.pgp>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux