Adding ceph-devel On 9/17/14, 1:27 AM, "Loic Dachary" <loic@xxxxxxxxxxx> wrote: > >Could you resend with ceph-devel in cc ? It's better for archive purposes >;-) > >On 17/09/2014 09:37, Johnu George (johnugeo) wrote: >> Hi Sage, >> I was looking at the crash that was reported in this mail >>chain. >> I am seeing that the crash happens when number of replicas configured is >> less than total number of osds to be selected as per rule. This is >> because, the crush temporary buffers are allocated as per num_rep size. >> (scratch array has size num_rep * 3) So, when number of osds to be >> selected is more, buffer overflow happens and it causes error/crash. I >>saw >> your earlier comment in this mail where you asked to create a rule that >> selects two osds per rack(2 racks) with num_rep=3. I feel that buffer >> overflow issue should happen in this situation too, that can cause 'out >>of >> array' access. Am I wrong somewhere or am I missing something? >> >> Johnu >> >> On 9/16/14, 9:39 AM, "Daniel Swarbrick" >> <daniel.swarbrick@xxxxxxxxxxxxxxxx> wrote: >> >>> Hi Loic, >>> >>> Thanks for providing a detailed example. I'm able to run the example >>> that you provide, and also got my own live crushmap to produce some >>> results, when I appended the "--num-rep 3" option to the command. >>> Without that option, even your example is throwing segfaults - maybe a >>> bug in crushtool? >>> >>> One other area I wasn't sure about - can the final "chooseleaf" step >>> specify "firstn 0" for simplicity's sake (and to automatically handle a >>> larger pool size in future) ? Would there be any downside to this? >>> >>> Cheers >>> >>> On 16/09/14 16:20, Loic Dachary wrote: >>>> Hi Daniel, >>>> >>>> When I run >>>> >>>> crushtool --outfn crushmap --build --num_osds 100 host straw 2 rack >>>> straw 10 default straw 0 >>>> crushtool -d crushmap -o crushmap.txt >>>> cat >> crushmap.txt <<EOF >>>> rule myrule { >>>> ruleset 1 >>>> type replicated >>>> min_size 1 >>>> max_size 10 >>>> step take default >>>> step choose firstn 2 type rack >>>> step chooseleaf firstn 2 type host >>>> step emit >>>> } >>>> EOF >>>> crushtool -c crushmap.txt -o crushmap >>>> crushtool -i crushmap --test --show-utilization --rule 1 --min-x 1 >>>> --max-x 10 --num-rep 3 >>>> >>>> I get >>>> >>>> rule 1 (myrule), x = 1..10, numrep = 3..3 >>>> CRUSH rule 1 x 1 [79,69,10] >>>> CRUSH rule 1 x 2 [56,58,60] >>>> CRUSH rule 1 x 3 [30,26,19] >>>> CRUSH rule 1 x 4 [14,8,69] >>>> CRUSH rule 1 x 5 [7,4,88] >>>> CRUSH rule 1 x 6 [54,52,37] >>>> CRUSH rule 1 x 7 [69,67,19] >>>> CRUSH rule 1 x 8 [51,46,83] >>>> CRUSH rule 1 x 9 [55,56,35] >>>> CRUSH rule 1 x 10 [54,51,95] >>>> rule 1 (myrule) num_rep 3 result size == 3: 10/10 >>>> >>>> What command are you running to get a core dump ? >>>> >>>> Cheers >>>> >>>> On 16/09/2014 12:02, Daniel Swarbrick wrote: >>>>> On 15/09/14 17:28, Sage Weil wrote: >>>>>> rule myrule { >>>>>> ruleset 1 >>>>>> type replicated >>>>>> min_size 1 >>>>>> max_size 10 >>>>>> step take default >>>>>> step choose firstn 2 type rack >>>>>> step chooseleaf firstn 2 type host >>>>>> step emit >>>>>> } >>>>>> >>>>>> That will give you 4 osds, spread across 2 hosts in each rack. The >>>>>> pool >>>>>> size (replication factor) is 3, so RADOS will just use the first >>>>>> three (2 >>>>>> hosts in first rack, 1 host in second rack). >>>>> I have a similar requirement, where we currently have four nodes, two >>>>> in >>>>> each fire zone, with pool size 3. At the moment, due to the number of >>>>> nodes, we are guaranteed at least one replica in each fire zone >>>>>(which >>>>> we represent with bucket type "room"). If we add more nodes in >>>>>future, >>>>> the current ruleset may cause all three replicas of a PG to land in a >>>>> single zone. >>>>> >>>>> I tried the ruleset suggested above (replacing "rack" with "room"), >>>>>but >>>>> when testing it with crushtool --test --show-utilization, I simply >>>>>get >>>>> segfaults. No amount of fiddling around seems to make it work - even >>>>> adding two new hypothetical nodes to the crushmap doesn't help. >>>>> >>>>> What could I perhaps be doing wrong? >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >-- >Loïc Dachary, Artisan Logiciel Libre > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html