Andre,
see responses inline.
Zitat von Andre Tann <atann@xxxxxxxxxxxx>:
Ahoi Eugen,
Am 29.11.24 um 11:31 schrieb Eugen Block:
step set_chooseleaf_tries 5 -> stick to defaults, usually works
(number of max attempts to find suitable OSDs)
Why do we need more than one attempt to find an OSD? Why is the
result different if we walk through a rule more than once?
There have been cases with a large number of OSDs where crush "gave up
too soon". Although I haven't read about that in quite a while, it may
or may not still be an issue.
step take default class test -> "default" is the usual default
crush root (check 'ceph osd tree'), you can specify other roots if
you have them
where are these classes defined? Or is "default class test" the name
of a root? Most probably not.
You define those classes. By default, Ceph creates a "default" entry
point into the crush tree of type "root":
ceph osd tree | head -2
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.14648 root default
You can create multiple roots with arbitrary names. Those roots can be
addressed in crush rules. Before there were device classes, users
split their trees into multiple roots, for example one for HDD, one
for SSD devices.
Could I also say step take default type host?
I haven't tried that, I would assume that the entry point still has to
be a bucket of type "root". I encourage you to play around in a lab
cluster to get familiar with crushmaps and especially the crushtool,
you'll benefit from it.
What are the keywords that are allowed after the root's name?
Fair question, I'm only aware of "class XYZ", so the device classes. I
haven't checked in detail though.
step chooseleaf indep 0 type host -> within bucket "root" (from
"step take default") choose {pool-num-replicas} hosts
What if I did exactly this, but have nested fault domains (e.g.
racks > hosts)? Would the rule then pick {pool-num-replicas} hosts
out of different racks, even though this rule doesn't mention racks
anywhere?
Since I don't have racks in my lab cluster, I don't specify them. You
need to modify your rule(s) according to your infrastructure, my
example was just a simple one from one of my lab clusters.
But what if I have size=4, but only two racks, would the picked
hosts spread evenly across the two racks, or randomly, like 1 host
in one rack, 3 in the other, or all 4 in one rack?
You can (and most likely will) end up with the random result if you
don't specifically tell crush what to do.
Assume a pool with size=4, could I say
step take default
choose firstn 1 type row
choose firstn 3 type racks
chooseleaf firstn 0 type host
Meaning:
- force all chunks of a pg in one row
- force all chunks in exactly three racks inside this row
- out of these three racks, pick 4 hosts
I don't want to say that the latter makes much sense, I just wonder
if it would work that way.
I think it would, but again, give it a try. You can create "virtual"
rows and racks, just add the respective buckets to the crushmap (of
your test cluster):
ceph osd crush add-bucket row1 row root=default
added bucket row1 type row to location {root=default}
ceph osd crush add-bucket rack1 rack row=row1
added bucket rack1 type rack to location {row=row1}
ceph osd crush add-bucket rack2 rack row=row1
added bucket rack2 type rack to location {row=row1}
ceph osd crush add-bucket rack3 rack row=row1
added bucket rack3 type rack to location {row=row1}
Then move some of your hosts into the racks with ceph 'osd crush
move...' and test your crush rules.
--
Andre Tann
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx