Re: Monitor segfaults when updating the crush map

Stephen Jahl <stephenjahl@xxxxxxxxx> · Thu, 9 Oct 2014 14:57:44 -0400

So, I _do_ have three racks, but unfortunately, one of them has fewer OSDs in it. Weighting takes care of a little bit of that, but I do end up with an uneven distribution (according to the utilization numbers from crushtool --test). Because of that, is how I ended up going down the "at least two racks" route.
I'll have to play around with various rules and see what works. Adding more OSDs to the third rack to even things up might be on the roadmap now as well :)

On Thu, Oct 9, 2014 at 2:37 PM, Johnu George (johnugeo) <johnugeo@xxxxxxxxx> wrote:

Stephen,
                 You are right. Crash can happen if replica size doesn’t match the no of osds.  I am not sure if there exists any other solution for your problem " choose first 2 replicas from a rack and choose third replica from any other rack different
 from one”. 

Some different thoughts:

1)If you have 3 racks, you can try for choose 3 racks and chooseleaf 1 host ensuring three separate racks and three replicas

2)Another thought

Take rack1
Chooseleaf firstn 2 type host
Emit

Take rack2
Chooseleaf firstn 1 type host
Emit

This of course restricts first 2 replicas in rack1 and may become unbalanced.(Ensure enough storage in rack1)

Thanks,
Johnu

From: Stephen Jahl <stephenjahl@xxxxxxxxx>

Date: Thursday, October 9, 2014 at 11:11 AM

To: Loic Dachary <loic@xxxxxxxxxxx>

Cc: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>

Subject: Re:  Monitor segfaults when updating the crush map

Thanks Loic,

In my case, I actually only have three replicas for my pools -- with this rule, I'm trying to ensure that at OSDs in at least two racks are selected. Since the replica size is only 3, I think I'm still affected by the bug (unless of course I set my replica
 size to 4).

Is there a better way I can express what I want in the crush rule, preferably in a way not hit by that bug ;) ? Is there an ETA on when that bugfix might land in firefly?

Best,
-Steve

On Thu, Oct 9, 2014 at 1:59 PM, Loic Dachary 
<loic@xxxxxxxxxxx> wrote:

Hi Stephen,

It looks like you're hitting 
http://tracker.ceph.com/issues/9492 which has been fixed but is not yet available in firefly. The simplest workaround is to min_size 4 in this case.

Cheers

On 09/10/2014 19:31, Stephen Jahl wrote:> Hi All,

>

> I'm trying to add a crush rule to my map, which looks like this:

>

> rule rack_ruleset {

> ruleset 1

> type replicated

> min_size 1

> max_size 10

> step take default

> step choose firstn 2 type rack

> step chooseleaf firstn 2 type host

> step emit

> }

>

> I'm not configuring any pools to use the ruleset at this time. When I recompile the map, and test the rule with crushtool --test, everything seems fine, and I'm not noticing anything out of the ordinary.

>

> But, when I try to inject the compiled crush map back into the cluster like this:

>

> ceph osd setcrushmap -i /path/to/compiled-crush-map

>

> The monitor process appears to stop, and I see a monitor election happening. Things hang until I ^C the setcrushmap command, and I need to restart the monitor processes to make things happy again (and the crush map never ends up getting updated).

>

> In the monitor logs, I see several segfaults that look like this: 
http://pastebin.com/K1XqPpbF

>

> I'm running ceph 0.80.5-1trusty on Ubuntu 14.04 with kernel 3.13.0-35-generic.

>

> Anyone have any ideas as to what is happening?

>

> -Steve

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

--

Loïc Dachary, Artisan Logiciel Libre

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com