Re: CRUSH map help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Also, when you say new enough for the overselection hack, how new are we talking?!?!

Cheers,
-
Stephen Mercier | Sr. Systems Architect
Attainia Capital Planning Solutions (ACPS)
O: (650)241-0567, 727 | TF: (866)288-2464, 727

On May 12, 2016, at 2:46 PM, Gregory Farnum wrote:

On Thu, May 12, 2016 at 2:36 PM, Stephen Mercier <stephen.mercier@xxxxxxxxxxxx> wrote:
I'm trying to setup a crush rule, and I was hoping you guys could clarify something for me.

I have 4 storage nodes across 2 cabinets. (2x2)

I have the crush hierarchy setup to reflect this layout (as follows):

rack cabinet2 { id -3 # do not change unnecessarily # weight xxxx alg straw hash 0 # rjenkins1 item cephstore04 weight xxxx item cephstore02 weight xxxx
}
rack cabinet1 { id -3 # do not change unnecessarily # weight xxxx alg straw hash 0 # rjenkins1 item cephstore03 weight xxxx item cephstore01 weight xxxx } root default { id -1 # do not change unnecessarily # weight xxxx alg straw hash 0 # rjenkins1 item cabinet2 weight xxxx item cabinet1 weight xxxx }
The default ruleset is as follows: (Big surprise!!)
rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step choose firstn 0 type osd step emit }

If I want this to ensure that there is at least 1 copy of the data in each cabinet, would I just change it to:

rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step choose firstn 0 type rack step emit }

Or should it be:

rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type rack step emit }


If you only wan two copies, the chooseleaf variant is correct.

Assuming you want three copies, neither of these is quite right. The use of "firstn 0" means "take the requested number of replicas in this selection step", so both of these would be asking for 3 racks; which obviously won't work.
(The "choose" variant doesn't work either, because you're telling it to select N racks and then emit those as the object locations! You'd need to add in a chooseleaf or set of choose calls underneath it.)

 

Or is there something more complicated I should be doing? I took a look at https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg19140.html and it sounds like this is what I want, but I've also see examples like the following:

rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default
step choose firstn 2 type rack step chooseleaf firstn 0 type osd step emit }

So this rule is saying "select 2 racks, and within each selected rack, choose N leaf nodes". That's also not quite what you'd want.

If all of your components are new enough, you can do the overselection hack:
 
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
        step choose firstn 2 type rack
step chooseleaf firstn 2 type osd
step emit
}

That selects two racks (ie, both) and then chooses 2 OSDs within each rack. But if you're only asking for three copies, it'll truncate the last OSD off the list; and because it's selecting racks in a different order each time you'll get a good distribution across racks.
-Greg


As you might have noticed, I'm a little confused, so any assistance is greatly appreciated. And just to clarify once more, I want to make sure that it stores at least one copy in each rack. Advice on getting more granular is welcome as well however, as I there are pools with both 2x and 3x replication setup.

Cheers,
-
Stephen Mercier | Sr. Systems Architect
Attainia Capital Planning Solutions (ACPS)
O: (650)241-0567, 727 | TF: (866)288-2464, 727


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux