Re: [ceph-users] Crushmap ruleset for rack aware PG placement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 17/09/2014 22:03, Johnu George (johnugeo) wrote:
> Loic,
>       You are right.  Are we planning to support configurations where
> replica number is different from the number of osds selected from a  rule?

I think crush should support it, yes. If a rule can provide 10 OSDs there is no reason for it to fail to provide just one.

Cheers

> If not, One solution is to add a validation check when a rule is activated
> for a pool of a specific replica.
> 
> Johnu
> 
> On 9/17/14, 9:10 AM, "Loic Dachary" <loic@xxxxxxxxxxx> wrote:
> 
>> Hi,
>>
>> If the number of replica desired is 1, then
>>
>> https://github.com/ceph/ceph/blob/firefly/src/crush/CrushWrapper.h#L915
>>
>> will be called with maxout = 1 and scratch will be maxout * 3. But if the
>> rule always selects 4 items, then it overflows. Is it what you also read ?
>>
>> Cheers
>>
>> On 17/09/2014 16:42, Johnu George (johnugeo) wrote:
>>> Adding ceph-devel
>>>
>>> On 9/17/14, 1:27 AM, "Loic Dachary" <loic@xxxxxxxxxxx> wrote:
>>>
>>>>
>>>> Could you resend with ceph-devel in cc ? It's better for archive
>>>> purposes
>>>> ;-)
>>>>
>>>> On 17/09/2014 09:37, Johnu George (johnugeo) wrote:
>>>>> Hi Sage,
>>>>>          I was looking at the crash that was reported in this mail
>>>>> chain.
>>>>> I am seeing that the crash happens when number of replicas configured
>>>>> is
>>>>> less than total number of osds to be selected as per rule. This is
>>>>> because, the crush temporary buffers are allocated as per num_rep
>>>>> size.
>>>>> (scratch array has size num_rep * 3) So, when number of osds to be
>>>>> selected is more, buffer overflow happens and it causes error/crash. I
>>>>> saw
>>>>> your earlier comment in this mail  where you asked to create a rule
>>>>> that
>>>>> selects two osds per rack(2 racks) with num_rep=3. I feel that buffer
>>>>> overflow issue should happen in this situation too, that can cause
>>>>> 'out
>>>>> of
>>>>> array' access. Am I wrong somewhere or am I missing something?
>>>>>
>>>>> Johnu
>>>>>
>>>>> On 9/16/14, 9:39 AM, "Daniel Swarbrick"
>>>>> <daniel.swarbrick@xxxxxxxxxxxxxxxx> wrote:
>>>>>
>>>>>> Hi Loic,
>>>>>>
>>>>>> Thanks for providing a detailed example. I'm able to run the example
>>>>>> that you provide, and also got my own live crushmap to produce some
>>>>>> results, when I appended the "--num-rep 3" option to the command.
>>>>>> Without that option, even your example is throwing segfaults - maybe
>>>>>> a
>>>>>> bug in crushtool?
>>>>>>
>>>>>> One other area I wasn't sure about - can the final "chooseleaf" step
>>>>>> specify "firstn 0" for simplicity's sake (and to automatically
>>>>>> handle a
>>>>>> larger pool size in future) ? Would there be any downside to this?
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> On 16/09/14 16:20, Loic Dachary wrote:
>>>>>>> Hi Daniel,
>>>>>>>
>>>>>>> When I run
>>>>>>>
>>>>>>> crushtool --outfn crushmap --build --num_osds 100 host straw 2 rack
>>>>>>> straw 10 default straw 0
>>>>>>> crushtool -d crushmap -o crushmap.txt
>>>>>>> cat >> crushmap.txt <<EOF
>>>>>>> rule myrule {
>>>>>>> 	ruleset 1
>>>>>>> 	type replicated
>>>>>>> 	min_size 1
>>>>>>> 	max_size 10
>>>>>>> 	step take default
>>>>>>> 	step choose firstn 2 type rack
>>>>>>> 	step chooseleaf firstn 2 type host
>>>>>>> 	step emit
>>>>>>> }
>>>>>>> EOF
>>>>>>> crushtool -c crushmap.txt -o crushmap
>>>>>>> crushtool -i crushmap --test --show-utilization --rule 1 --min-x 1
>>>>>>> --max-x 10 --num-rep 3
>>>>>>>
>>>>>>> I get
>>>>>>>
>>>>>>> rule 1 (myrule), x = 1..10, numrep = 3..3
>>>>>>> CRUSH rule 1 x 1 [79,69,10]
>>>>>>> CRUSH rule 1 x 2 [56,58,60]
>>>>>>> CRUSH rule 1 x 3 [30,26,19]
>>>>>>> CRUSH rule 1 x 4 [14,8,69]
>>>>>>> CRUSH rule 1 x 5 [7,4,88]
>>>>>>> CRUSH rule 1 x 6 [54,52,37]
>>>>>>> CRUSH rule 1 x 7 [69,67,19]
>>>>>>> CRUSH rule 1 x 8 [51,46,83]
>>>>>>> CRUSH rule 1 x 9 [55,56,35]
>>>>>>> CRUSH rule 1 x 10 [54,51,95]
>>>>>>> rule 1 (myrule) num_rep 3 result size == 3:	10/10
>>>>>>>
>>>>>>> What command are you running to get a core dump ?
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> On 16/09/2014 12:02, Daniel Swarbrick wrote:
>>>>>>>> On 15/09/14 17:28, Sage Weil wrote:
>>>>>>>>> rule myrule {
>>>>>>>>> 	ruleset 1
>>>>>>>>> 	type replicated
>>>>>>>>> 	min_size 1
>>>>>>>>> 	max_size 10
>>>>>>>>> 	step take default
>>>>>>>>> 	step choose firstn 2 type rack
>>>>>>>>> 	step chooseleaf firstn 2 type host
>>>>>>>>> 	step emit
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> That will give you 4 osds, spread across 2 hosts in each rack.
>>>>>>>>> The
>>>>>>>>> pool 
>>>>>>>>> size (replication factor) is 3, so RADOS will just use the first
>>>>>>>>> three (2 
>>>>>>>>> hosts in first rack, 1 host in second rack).
>>>>>>>> I have a similar requirement, where we currently have four nodes,
>>>>>>>> two
>>>>>>>> in
>>>>>>>> each fire zone, with pool size 3. At the moment, due to the number
>>>>>>>> of
>>>>>>>> nodes, we are guaranteed at least one replica in each fire zone
>>>>>>>> (which
>>>>>>>> we represent with bucket type "room"). If we add more nodes in
>>>>>>>> future,
>>>>>>>> the current ruleset may cause all three replicas of a PG to land
>>>>>>>> in a
>>>>>>>> single zone.
>>>>>>>>
>>>>>>>> I tried the ruleset suggested above (replacing "rack" with "room"),
>>>>>>>> but
>>>>>>>> when testing it with crushtool --test --show-utilization, I simply
>>>>>>>> get
>>>>>>>> segfaults. No amount of fiddling around seems to make it work -
>>>>>>>> even
>>>>>>>> adding two new hypothetical nodes to the crushmap doesn't help.
>>>>>>>>
>>>>>>>> What could I perhaps be doing wrong?
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list
>>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>
>>>> -- 
>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>
>>>
>>
>> -- 
>> Loïc Dachary, Artisan Logiciel Libre
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux