Re: Monitor segfaults when updating the crush map

"Johnu George (johnugeo)" <johnugeo@xxxxxxxxx> · Thu, 9 Oct 2014 18:37:52 +0000

Stephen,
                 You are right. Crash can happen if replica size doesn’t match the no of osds.  I am not sure if there exists any other solution for your problem " choose first 2 replicas from a rack and choose third replica from any other rack different
 from one”. 

Some different thoughts:

1)If you have 3 racks, you can try for choose 3 racks and chooseleaf 1 host ensuring three separate racks and three replicas

2)Another thought

Take rack1
Chooseleaf firstn 2 type host
Emit

Take rack2
Chooseleaf firstn 1 type host
Emit

This of course restricts first 2 replicas in rack1 and may become unbalanced.(Ensure enough storage in rack1)

Thanks,
Johnu

From: Stephen Jahl <stephenjahl@xxxxxxxxx>

Date: Thursday, October 9, 2014 at 11:11 AM

To: Loic Dachary <loic@xxxxxxxxxxx>

Cc: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>

Subject: Re:  Monitor segfaults when updating the crush map

Thanks Loic,

In my case, I actually only have three replicas for my pools -- with this rule, I'm trying to ensure that at OSDs in at least two racks are selected. Since the replica size is only 3, I think I'm still affected by the bug (unless of course I set my replica
 size to 4).

Is there a better way I can express what I want in the crush rule, preferably in a way not hit by that bug ;) ? Is there an ETA on when that bugfix might land in firefly?

Best,
-Steve

On Thu, Oct 9, 2014 at 1:59 PM, Loic Dachary 
<loic@xxxxxxxxxxx> wrote:

Hi Stephen,

It looks like you're hitting 
http://tracker.ceph.com/issues/9492 which has been fixed but is not yet available in firefly. The simplest workaround is to min_size 4 in this case.

Cheers

On 09/10/2014 19:31, Stephen Jahl wrote:> Hi All,

>

> I'm trying to add a crush rule to my map, which looks like this:

>

> rule rack_ruleset {

> ruleset 1

> type replicated

> min_size 1

> max_size 10

> step take default

> step choose firstn 2 type rack

> step chooseleaf firstn 2 type host

> step emit

> }

>

> I'm not configuring any pools to use the ruleset at this time. When I recompile the map, and test the rule with crushtool --test, everything seems fine, and I'm not noticing anything out of the ordinary.

>

> But, when I try to inject the compiled crush map back into the cluster like this:

>

> ceph osd setcrushmap -i /path/to/compiled-crush-map

>

> The monitor process appears to stop, and I see a monitor election happening. Things hang until I ^C the setcrushmap command, and I need to restart the monitor processes to make things happy again (and the crush map never ends up getting updated).

>

> In the monitor logs, I see several segfaults that look like this: 
http://pastebin.com/K1XqPpbF

>

> I'm running ceph 0.80.5-1trusty on Ubuntu 14.04 with kernel 3.13.0-35-generic.

>

> Anyone have any ideas as to what is happening?

>

> -Steve

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

--

Loïc Dachary, Artisan Logiciel Libre

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com