I followed then Eugens advice with something like: - create a new rule via CLI which includes a device class - dump the crushmap again and test the new rule with crushtool - If the output is as expected, assign the new rule to a pool of your choice, I'd start with a less important one. - If everything's good, do the same for all necessary pools, wait for remapping to finish. - No pool should be using the default "replicated_rule" now. - Dump a fresh crushmap and decompile it - add a "class hdd" entry to the default replicated_rule - save and compile - inject the modified crushmap (with this single change), nothing should happen in the cluster since no pool should use the replicated_rule at that point. ... After I've changed pools to a new rule, remapping started, warning messages do not show any more and PGs for pools are increasing. Thank you all for the help. Rok On Tue, Dec 24, 2024 at 2:16 AM Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote: > If your NVMe OSDs have the `ssd` device class, doing what you suggest > might not even result in any data movement. > > docs.ceph.com > <https://docs.ceph.com/en/reef/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes> > > <https://docs.ceph.com/en/reef/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes> > <https://docs.ceph.com/en/reef/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes> > > This page shows how to use the reclassify feature to help avoid typos when > editing the CRUSHmap. Using a CLI tool when feasible makes this sort of > thing a lot safer, compared to back in the day when we had to text-edit > everything by hand :nailbiting:. One can readily diff the before and after > decompiled text CRUSHmaps to ensure sanity before recompiling and injecting. > > I’ve done this myself multiple times since device classes became a thing. > > > > On Dec 23, 2024, at 5:05 PM, Rok Jaklič <rjaklic@xxxxxxxxx> wrote: > > I will try changing/adding class ssd to replicated_rule tomorrow even > though I am a little hesitant for some reason to edit this rule since it > could mean that system data for rgw would "stay somewhere" if something > goes wrong. I was much braver when I was changing the rule for EC32 where I > separated OSD data to just hdd, since "some data" was already on hdd. > > > On Mon, Dec 23, 2024 at 4:12 PM Anthony D'Atri <anthony.datri@xxxxxxxxx> > wrote: > > Agreed. The .mgr pool is a usual suspect here, especially when using > Rook. When any pool is constrained to a device class, this kind of warning > will happen if *all* pools don’t specify one. > > Of course there’s also the strategy of disabling the autoscaler, but that > takes more analysis. We old farts are used to it, but it can be daunting > for whippersnappers. > > On Dec 23, 2024, at 9:11 AM, Eugen Block <eblock@xxxxxx> wrote: > > Don't try to delete a root, that will definitely break something. > > Instead, check the crush rules which don't use a device class and use the > reclassify of the crushtool to modify the rules. This will trigger only a > bit of data movement, but not as much as a simple change of the rule would. > > > Zitat von Rok Jaklič <rjaklic@xxxxxxxxx>: > > I got a similar problem after changing pool class to use only hdd > > following > > https://www.spinics.net/lists/ceph-users/msg84987.html. Data migrated > successfully. > > I get warnings like: > 2024-12-23T14:39:37.103+0100 7f949edad640 0 [pg_autoscaler WARNING > > root] > > pool default.rgw.buckets.index won't scale due to overlapping roots: > > {-1, > > -18} > 2024-12-23T14:39:37.105+0100 7f949edad640 0 [pg_autoscaler WARNING > > root] > > pool default.rgw.buckets.data won't scale due to overlapping roots: {-2, > -1, -18} > 2024-12-23T14:39:37.107+0100 7f949edad640 0 [pg_autoscaler WARNING > > root] > > pool cephfs_metadata won't scale due to overlapping roots: {-2, -1, -18} > 2024-12-23T14:39:37.111+0100 7f949edad640 0 [pg_autoscaler WARNING > > root] > > pool 1 contains an overlapping root -1... skipping scaling > ... > > while crush tree with shadow shows: > -2 hdd 1043.93188 root default~hdd > -4 hdd 151.82336 host ctplosd1~hdd > 0 hdd 5.45798 osd.0 > 1 hdd 5.45798 osd.1 > 2 hdd 5.45798 osd.2 > 3 hdd 5.45798 osd.3 > 4 hdd 5.45798 osd.4 > ... > -1 1050.48230 root default > -3 153.27872 host ctplosd1 > 0 hdd 5.45798 osd.0 > 1 hdd 5.45798 osd.1 > 2 hdd 5.45798 osd.2 > 3 hdd 5.45798 osd.3 > 4 hdd 5.45798 osd.4 > ... > > and even though crush rule for example for > > pool 9 'default.rgw.buckets.data' erasure profile ec-32-profile size 5 > min_size 4 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512 > autoscale_mode on last_change 320144 lfor 0/127784/214408 flags > hashpspool,ec_overwrites stripe_width 12288 application rgw > > is set to: > { > "rule_id": 1, > "rule_name": "ec32", > "type": 3, > "steps": [ > { > "op": "set_chooseleaf_tries", > "num": 5 > }, > { > "op": "set_choose_tries", > "num": 100 > }, > { > "op": "take", > "item": -2, > "item_name": "default~hdd" > }, > { > "op": "chooseleaf_indep", > "num": 0, > "type": "host" > }, > { > "op": "emit" > } > ] > }, > > and I still get warning messages. > > Is there a way I can check if a particular "root" is used somewhere > > other > > than go thorough ceph osd pool ls detail and look into crush rule? > > Can I somehow delete "old" root default? > > Would it be safe to change pg manually even with overlapped roots? > > Rok > > > On Wed, Jan 25, 2023 at 12:03 PM Massimo Sgaravatto < > massimo.sgaravatto@xxxxxxxxx> wrote: > > I tried the following on a small testbed first: > > ceph osd erasure-code-profile set profile-4-2-hdd k=4 m=2 > crush-failure-domain=host crush-device-class=hdd > ceph osd crush rule create-erasure ecrule-4-2-hdd profile-4-2-hdd > ceph osd pool set ecpool-4-2 crush_rule ecrule-4-2-hdd > > and indeed after having applied this change for all the EC pools, the > autoscaler doesn't complain anymore > > Thanks a lot ! > > Cheers, Massimo > > On Tue, Jan 24, 2023 at 7:02 PM Eugen Block <eblock@xxxxxx> wrote: > > Hi, > > what you can’t change with EC pools is the EC profile, the pool‘s > ruleset you can change. The fix is the same as for the replicates > pools, assign a ruleset with hdd class and after some data movement > the autoscaler should not complain anymore. > > Regards > Eugen > > Zitat von Massimo Sgaravatto <massimo.sgaravatto@xxxxxxxxx>: > > Dear all > > I have just changed the crush rule for all the replicated pools in > > the > > following way: > > ceph osd crush rule create-replicated replicated_hdd default host > > hdd > > ceph osd pool set <poolname> crush_rule replicated_hdd > > See also this [*] thread > Before applying this change, these pools were all using > the replicated_ruleset rule where the class is not specified. > > > > I am noticing now a problem with the autoscaler: "ceph osd pool > autoscale-status" doesn't report any output and the mgr log > > complains > > about > > overlapping roots: > > [pg_autoscaler ERROR root] pool xyz has overlapping roots: {-18, > > -1} > > > > Indeed: > > # ceph osd crush tree --show-shadow > ID CLASS WEIGHT TYPE NAME > -18 hdd 1329.26501 root default~hdd > -17 hdd 329.14154 rack Rack11-PianoAlto~hdd > -15 hdd 54.56085 host ceph-osd-04~hdd > 30 hdd 5.45609 osd.30 > 31 hdd 5.45609 osd.31 > ... > ... > -1 1329.26501 root default > -7 329.14154 rack Rack11-PianoAlto > -8 54.56085 host ceph-osd-04 > 30 hdd 5.45609 osd.30 > 31 hdd 5.45609 osd.31 > ... > > I have already read about this behavior but I have no clear ideas > > how > > to > > fix the problem. > > I read somewhere that the problem happens when there are rules that > > force > > some pools to only use one class and there are also pools which > > does > > not > > make any distinction between device classes > > > All the replicated pools are using the replicated_hdd pool but I > > also > > have > > some EC pools which are using a profile where the class is not > > specified. > > As far I understand, I can't force these pools to use only the hdd > > class: > > according to the doc I can't change this profile specifying the hdd > > class > > (or at least the change wouldn't be applied to the existing EC > > pools) > > > Any suggestions ? > > The crush map is available at > > https://cernbox.cern.ch/s/gIyjbQbmoTFHCrr, > > if > > you want to have a look > > Many thanks, Massimo > > [*] https://www.mail-archive.com/ceph-users@xxxxxxx/msg18534.html > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx