Re: Monitor segfaults when updating the crush map

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Stephen,
                 You are right. Crash can happen if replica size doesn’t match the no of osds.  I am not sure if there exists any other solution for your problem " choose first 2 replicas from a rack and choose third replica from any other rack different from one”. 

Some different thoughts:


1)If you have 3 racks, you can try for choose 3 racks and chooseleaf 1 host ensuring three separate racks and three replicas


2)Another thought

Take rack1
Chooseleaf firstn 2 type host
Emit
Take rack2
Chooseleaf firstn 1 type host
Emit

This of course restricts first 2 replicas in rack1 and may become unbalanced.(Ensure enough storage in rack1)

Thanks,
Johnu
From: Stephen Jahl <stephenjahl@xxxxxxxxx>
Date: Thursday, October 9, 2014 at 11:11 AM
To: Loic Dachary <loic@xxxxxxxxxxx>
Cc: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
Subject: Re: Monitor segfaults when updating the crush map

Thanks Loic,

In my case, I actually only have three replicas for my pools -- with this rule, I'm trying to ensure that at OSDs in at least two racks are selected. Since the replica size is only 3, I think I'm still affected by the bug (unless of course I set my replica size to 4).

Is there a better way I can express what I want in the crush rule, preferably in a way not hit by that bug ;) ? Is there an ETA on when that bugfix might land in firefly?

Best,
-Steve

On Thu, Oct 9, 2014 at 1:59 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote:
Hi Stephen,

It looks like you're hitting http://tracker.ceph.com/issues/9492 which has been fixed but is not yet available in firefly. The simplest workaround is to min_size 4 in this case.

Cheers

On 09/10/2014 19:31, Stephen Jahl wrote:> Hi All,
>
> I'm trying to add a crush rule to my map, which looks like this:
>
> rule rack_ruleset {
> ruleset 1
> type replicated
> min_size 1
> max_size 10
> step take default
> step choose firstn 2 type rack
> step chooseleaf firstn 2 type host
> step emit
> }
>
> I'm not configuring any pools to use the ruleset at this time. When I recompile the map, and test the rule with crushtool --test, everything seems fine, and I'm not noticing anything out of the ordinary.
>
> But, when I try to inject the compiled crush map back into the cluster like this:
>
> ceph osd setcrushmap -i /path/to/compiled-crush-map
>
> The monitor process appears to stop, and I see a monitor election happening. Things hang until I ^C the setcrushmap command, and I need to restart the monitor processes to make things happy again (and the crush map never ends up getting updated).
>
> In the monitor logs, I see several segfaults that look like this: http://pastebin.com/K1XqPpbF
>
> I'm running ceph 0.80.5-1trusty on Ubuntu 14.04 with kernel 3.13.0-35-generic.
>
> Anyone have any ideas as to what is happening?
>
> -Steve
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

--
Loïc Dachary, Artisan Logiciel Libre


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux