Right : I thought about data loss but what you're after is data availability. Thanks for explaining :-) On 10/09/2014 04:29, Lei Dong wrote: > Yes, My goal is to make it loosing 3 OSD does not lose data. > > My 6 racks may not be in different rooms but they use 6 different > switches, so I want when any switch is down or unreachable, my data can > still be accessed. I think it?s not an unrealistic requirement. > > > Thanks! > > LeiDong. > > On 9/9/14, 10:02 PM, "Loic Dachary" <loic at dachary.org> wrote: > >> >> >> On 09/09/2014 14:21, Lei Dong wrote: >>> Thanks loic! >>> >>> Actually I've found that increase choose_local_fallback_tries can >>> help(chooseleaf_tries helps not so significantly), but I'm afraid when >>> osd failure happen and need to find new acting set, it may be fail to >>> find enough racks again. So I'm trying to find a more guaranteed way in >>> case of osd failure. >>> >>> My profile is nothing special other than k=8 m=3. >> >> So your goal is to make it so loosing 3 OSD simultaneously does not mean >> loosing data. By forcing each rack to hold at most 2 OSDs for a given >> object, you make it so loosing a full rack does not mean loosing data. >> Are these racks in the same room in the datacenter ? In the event of a >> catastrophic failure that permanently destroy one rack, how realistic is >> it that the other racks are unharmed ? If the rack is destroyed by fire >> and is in a row with the six other racks, there is a very high chance >> that the other racks will also be damaged. Note that I am not a system >> architect nor a system administrator : I may be completely wrong ;-) If >> it turns out that the probability of a single rack to fail entirely and >> independently of the others is negligible, it may not be necessary to >> make a complex ruleset and instead use the default ruleset. >> >> My 2cts >> >>> >>> Thanks again! >>> >>> Leidong >>> >>> >>> >>> >>> >>>> On 2014?9?9?, at ??7:53, "Loic Dachary" <loic at dachary.org> wrote: >>>> >>>> Hi, >>>> >>>> It is indeed possible that mapping fails if there are just enough >>>> racks to match the constraint. And the probability of a bad mapping >>>> increases when the number of PG increases because there is a need for >>>> more mapping. You can tell crush to try harder with >>>> >>>> step set_chooseleaf_tries 10 >>>> >>>> Be careful though : increasing this number will change mapping. It >>>> will not just fix the bad mappings you're seeing, it will also change >>>> the mappings that succeeded with a lower value. Once you've set this >>>> parameter, it cannot be modified. >>>> >>>> Would you mind sharing the erasure code profile you plan to work with ? >>>> >>>> Cheers >>>> >>>>> On 09/09/2014 12:39, Lei Dong wrote: >>>>> Hi ceph users: >>>>> >>>>> I want to create a customized crush rule for my EC pool (with >>>>> replica_size = 11) to distribute replicas into 6 different Racks. >>>>> >>>>> I use the following rule at first: >>>>> >>>>> Step take default // root >>>>> Step choose firstn 6 type rack// 6 racks, I have and only have 6 racks >>>>> Step chooseleaf indep 2 type osd // 2 osds per rack >>>>> Step emit >>>>> >>>>> I looks fine and works fine when PG num is small. >>>>> But when pg num increase, there are always some PGs which can not >>>>> take all the 6 racks. >>>>> It looks like ?Step choose firstn 6 type rack? sometimes returns only >>>>> 5 racks. >>>>> After some investigation, I think it may caused by collision of >>>>> choices. >>>>> >>>>> Then I come up with another solution to solve collision like this: >>>>> >>>>> Step take rack0 >>>>> Step chooseleaf indep 2 type osd >>>>> Step emit >>>>> Step take rack1 >>>>> ?. >>>>> (manually take every rack) >>>>> >>>>> This won?t cause rack collision, because I specify rack by name at >>>>> first. But the problem is that osd in rack0 will always be primary osd >>>>> because I choose from rack0 first. >>>>> >>>>> So the question is what is the recommended way to meet such a need >>>>> (distribute 11 replicas into 6 racks evenly in case of rack failure)? >>>>> >>>>> >>>>> Thanks! >>>>> LeiDong >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users at lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> -- >>>> Lo?c Dachary, Artisan Logiciel Libre >>>> >> >> -- >> Lo?c Dachary, Artisan Logiciel Libre >> > -- Lo?c Dachary, Artisan Logiciel Libre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 263 bytes Desc: OpenPGP digital signature URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140910/cf982efd/attachment.pgp>