Re: classes crush rules new cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andre,

see responses inline.

Zitat von Andre Tann <atann@xxxxxxxxxxxx>:

Ahoi Eugen,

Am 29.11.24 um 11:31 schrieb Eugen Block:

step set_chooseleaf_tries 5 -> stick to defaults, usually works (number of max attempts to find suitable OSDs)

Why do we need more than one attempt to find an OSD? Why is the result different if we walk through a rule more than once?

There have been cases with a large number of OSDs where crush "gave up too soon". Although I haven't read about that in quite a while, it may or may not still be an issue.

step take default class test -> "default" is the usual default crush root (check 'ceph osd tree'), you can specify other roots if you have them

where are these classes defined? Or is "default class test" the name of a root? Most probably not.

You define those classes. By default, Ceph creates a "default" entry point into the crush tree of type "root":

ceph osd tree | head -2
ID  CLASS  WEIGHT   TYPE NAME            STATUS  REWEIGHT  PRI-AFF
-1         0.14648  root default

You can create multiple roots with arbitrary names. Those roots can be addressed in crush rules. Before there were device classes, users split their trees into multiple roots, for example one for HDD, one for SSD devices.

Could I also say step take default type host?

I haven't tried that, I would assume that the entry point still has to be a bucket of type "root". I encourage you to play around in a lab cluster to get familiar with crushmaps and especially the crushtool, you'll benefit from it.

What are the keywords that are allowed after the root's name?

Fair question, I'm only aware of "class XYZ", so the device classes. I haven't checked in detail though.

step chooseleaf indep 0 type host -> within bucket "root" (from "step take default") choose {pool-num-replicas} hosts

What if I did exactly this, but have nested fault domains (e.g. racks > hosts)? Would the rule then pick {pool-num-replicas} hosts out of different racks, even though this rule doesn't mention racks anywhere?

Since I don't have racks in my lab cluster, I don't specify them. You need to modify your rule(s) according to your infrastructure, my example was just a simple one from one of my lab clusters.

But what if I have size=4, but only two racks, would the picked hosts spread evenly across the two racks, or randomly, like 1 host in one rack, 3 in the other, or all 4 in one rack?

You can (and most likely will) end up with the random result if you don't specifically tell crush what to do.

Assume a pool with size=4, could I say

  step take default
  choose firstn 1 type row
  choose firstn 3 type racks
  chooseleaf firstn 0 type host

Meaning:
- force all chunks of a pg in one row
- force all chunks in exactly three racks inside this row
- out of these three racks, pick 4 hosts

I don't want to say that the latter makes much sense, I just wonder if it would work that way.

I think it would, but again, give it a try. You can create "virtual" rows and racks, just add the respective buckets to the crushmap (of your test cluster):

ceph osd crush add-bucket row1 row root=default
added bucket row1 type row to location {root=default}

ceph osd crush add-bucket rack1 rack row=row1
added bucket rack1 type rack to location {row=row1}

ceph osd crush add-bucket rack2 rack row=row1
added bucket rack2 type rack to location {row=row1}

ceph osd crush add-bucket rack3 rack row=row1
added bucket rack3 type rack to location {row=row1}

Then move some of your hosts into the racks with ceph 'osd crush move...' and test your crush rules.



--
Andre Tann
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux