On 17/09/2014 22:03, Johnu George (johnugeo) wrote: > Loic, > You are right. Are we planning to support configurations where > replica number is different from the number of osds selected from a rule? I think crush should support it, yes. If a rule can provide 10 OSDs there is no reason for it to fail to provide just one. Cheers > If not, One solution is to add a validation check when a rule is activated > for a pool of a specific replica. > > Johnu > > On 9/17/14, 9:10 AM, "Loic Dachary" <loic@xxxxxxxxxxx> wrote: > >> Hi, >> >> If the number of replica desired is 1, then >> >> https://github.com/ceph/ceph/blob/firefly/src/crush/CrushWrapper.h#L915 >> >> will be called with maxout = 1 and scratch will be maxout * 3. But if the >> rule always selects 4 items, then it overflows. Is it what you also read ? >> >> Cheers >> >> On 17/09/2014 16:42, Johnu George (johnugeo) wrote: >>> Adding ceph-devel >>> >>> On 9/17/14, 1:27 AM, "Loic Dachary" <loic@xxxxxxxxxxx> wrote: >>> >>>> >>>> Could you resend with ceph-devel in cc ? It's better for archive >>>> purposes >>>> ;-) >>>> >>>> On 17/09/2014 09:37, Johnu George (johnugeo) wrote: >>>>> Hi Sage, >>>>> I was looking at the crash that was reported in this mail >>>>> chain. >>>>> I am seeing that the crash happens when number of replicas configured >>>>> is >>>>> less than total number of osds to be selected as per rule. This is >>>>> because, the crush temporary buffers are allocated as per num_rep >>>>> size. >>>>> (scratch array has size num_rep * 3) So, when number of osds to be >>>>> selected is more, buffer overflow happens and it causes error/crash. I >>>>> saw >>>>> your earlier comment in this mail where you asked to create a rule >>>>> that >>>>> selects two osds per rack(2 racks) with num_rep=3. I feel that buffer >>>>> overflow issue should happen in this situation too, that can cause >>>>> 'out >>>>> of >>>>> array' access. Am I wrong somewhere or am I missing something? >>>>> >>>>> Johnu >>>>> >>>>> On 9/16/14, 9:39 AM, "Daniel Swarbrick" >>>>> <daniel.swarbrick@xxxxxxxxxxxxxxxx> wrote: >>>>> >>>>>> Hi Loic, >>>>>> >>>>>> Thanks for providing a detailed example. I'm able to run the example >>>>>> that you provide, and also got my own live crushmap to produce some >>>>>> results, when I appended the "--num-rep 3" option to the command. >>>>>> Without that option, even your example is throwing segfaults - maybe >>>>>> a >>>>>> bug in crushtool? >>>>>> >>>>>> One other area I wasn't sure about - can the final "chooseleaf" step >>>>>> specify "firstn 0" for simplicity's sake (and to automatically >>>>>> handle a >>>>>> larger pool size in future) ? Would there be any downside to this? >>>>>> >>>>>> Cheers >>>>>> >>>>>> On 16/09/14 16:20, Loic Dachary wrote: >>>>>>> Hi Daniel, >>>>>>> >>>>>>> When I run >>>>>>> >>>>>>> crushtool --outfn crushmap --build --num_osds 100 host straw 2 rack >>>>>>> straw 10 default straw 0 >>>>>>> crushtool -d crushmap -o crushmap.txt >>>>>>> cat >> crushmap.txt <<EOF >>>>>>> rule myrule { >>>>>>> ruleset 1 >>>>>>> type replicated >>>>>>> min_size 1 >>>>>>> max_size 10 >>>>>>> step take default >>>>>>> step choose firstn 2 type rack >>>>>>> step chooseleaf firstn 2 type host >>>>>>> step emit >>>>>>> } >>>>>>> EOF >>>>>>> crushtool -c crushmap.txt -o crushmap >>>>>>> crushtool -i crushmap --test --show-utilization --rule 1 --min-x 1 >>>>>>> --max-x 10 --num-rep 3 >>>>>>> >>>>>>> I get >>>>>>> >>>>>>> rule 1 (myrule), x = 1..10, numrep = 3..3 >>>>>>> CRUSH rule 1 x 1 [79,69,10] >>>>>>> CRUSH rule 1 x 2 [56,58,60] >>>>>>> CRUSH rule 1 x 3 [30,26,19] >>>>>>> CRUSH rule 1 x 4 [14,8,69] >>>>>>> CRUSH rule 1 x 5 [7,4,88] >>>>>>> CRUSH rule 1 x 6 [54,52,37] >>>>>>> CRUSH rule 1 x 7 [69,67,19] >>>>>>> CRUSH rule 1 x 8 [51,46,83] >>>>>>> CRUSH rule 1 x 9 [55,56,35] >>>>>>> CRUSH rule 1 x 10 [54,51,95] >>>>>>> rule 1 (myrule) num_rep 3 result size == 3: 10/10 >>>>>>> >>>>>>> What command are you running to get a core dump ? >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>> On 16/09/2014 12:02, Daniel Swarbrick wrote: >>>>>>>> On 15/09/14 17:28, Sage Weil wrote: >>>>>>>>> rule myrule { >>>>>>>>> ruleset 1 >>>>>>>>> type replicated >>>>>>>>> min_size 1 >>>>>>>>> max_size 10 >>>>>>>>> step take default >>>>>>>>> step choose firstn 2 type rack >>>>>>>>> step chooseleaf firstn 2 type host >>>>>>>>> step emit >>>>>>>>> } >>>>>>>>> >>>>>>>>> That will give you 4 osds, spread across 2 hosts in each rack. >>>>>>>>> The >>>>>>>>> pool >>>>>>>>> size (replication factor) is 3, so RADOS will just use the first >>>>>>>>> three (2 >>>>>>>>> hosts in first rack, 1 host in second rack). >>>>>>>> I have a similar requirement, where we currently have four nodes, >>>>>>>> two >>>>>>>> in >>>>>>>> each fire zone, with pool size 3. At the moment, due to the number >>>>>>>> of >>>>>>>> nodes, we are guaranteed at least one replica in each fire zone >>>>>>>> (which >>>>>>>> we represent with bucket type "room"). If we add more nodes in >>>>>>>> future, >>>>>>>> the current ruleset may cause all three replicas of a PG to land >>>>>>>> in a >>>>>>>> single zone. >>>>>>>> >>>>>>>> I tried the ruleset suggested above (replacing "rack" with "room"), >>>>>>>> but >>>>>>>> when testing it with crushtool --test --show-utilization, I simply >>>>>>>> get >>>>>>>> segfaults. No amount of fiddling around seems to make it work - >>>>>>>> even >>>>>>>> adding two new hypothetical nodes to the crushmap doesn't help. >>>>>>>> >>>>>>>> What could I perhaps be doing wrong? >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list >>>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> ceph-users mailing list >>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>> >>>> -- >>>> Loïc Dachary, Artisan Logiciel Libre >>>> >>> >> >> -- >> Loïc Dachary, Artisan Logiciel Libre >> > -- Loïc Dachary, Artisan Logiciel Libre
Attachment:
signature.asc
Description: OpenPGP digital signature