I will try changing/adding class ssd to replicated_rule tomorrow even though I am a little hesitant for some reason to edit this rule since it could mean that system data for rgw would "stay somewhere" if something goes wrong. I was much braver when I was changing the rule for EC32 where I separated OSD data to just hdd, since "some data" was already on hdd. On Mon, Dec 23, 2024 at 4:12 PM Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote: > Agreed. The .mgr pool is a usual suspect here, especially when using > Rook. When any pool is constrained to a device class, this kind of warning > will happen if *all* pools don’t specify one. > > Of course there’s also the strategy of disabling the autoscaler, but that > takes more analysis. We old farts are used to it, but it can be daunting > for whippersnappers. > > > On Dec 23, 2024, at 9:11 AM, Eugen Block <eblock@xxxxxx> wrote: > > > > Don't try to delete a root, that will definitely break something. > Instead, check the crush rules which don't use a device class and use the > reclassify of the crushtool to modify the rules. This will trigger only a > bit of data movement, but not as much as a simple change of the rule would. > > > > Zitat von Rok Jaklič <rjaklic@xxxxxxxxx>: > > > >> I got a similar problem after changing pool class to use only hdd > following > >> https://www.spinics.net/lists/ceph-users/msg84987.html. Data migrated > >> successfully. > >> > >> I get warnings like: > >> 2024-12-23T14:39:37.103+0100 7f949edad640 0 [pg_autoscaler WARNING > root] > >> pool default.rgw.buckets.index won't scale due to overlapping roots: > {-1, > >> -18} > >> 2024-12-23T14:39:37.105+0100 7f949edad640 0 [pg_autoscaler WARNING > root] > >> pool default.rgw.buckets.data won't scale due to overlapping roots: {-2, > >> -1, -18} > >> 2024-12-23T14:39:37.107+0100 7f949edad640 0 [pg_autoscaler WARNING > root] > >> pool cephfs_metadata won't scale due to overlapping roots: {-2, -1, -18} > >> 2024-12-23T14:39:37.111+0100 7f949edad640 0 [pg_autoscaler WARNING > root] > >> pool 1 contains an overlapping root -1... skipping scaling > >> ... > >> > >> while crush tree with shadow shows: > >> -2 hdd 1043.93188 root default~hdd > >> -4 hdd 151.82336 host ctplosd1~hdd > >> 0 hdd 5.45798 osd.0 > >> 1 hdd 5.45798 osd.1 > >> 2 hdd 5.45798 osd.2 > >> 3 hdd 5.45798 osd.3 > >> 4 hdd 5.45798 osd.4 > >> ... > >> -1 1050.48230 root default > >> -3 153.27872 host ctplosd1 > >> 0 hdd 5.45798 osd.0 > >> 1 hdd 5.45798 osd.1 > >> 2 hdd 5.45798 osd.2 > >> 3 hdd 5.45798 osd.3 > >> 4 hdd 5.45798 osd.4 > >> ... > >> > >> and even though crush rule for example for > >> > >> pool 9 'default.rgw.buckets.data' erasure profile ec-32-profile size 5 > >> min_size 4 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512 > >> autoscale_mode on last_change 320144 lfor 0/127784/214408 flags > >> hashpspool,ec_overwrites stripe_width 12288 application rgw > >> > >> is set to: > >> { > >> "rule_id": 1, > >> "rule_name": "ec32", > >> "type": 3, > >> "steps": [ > >> { > >> "op": "set_chooseleaf_tries", > >> "num": 5 > >> }, > >> { > >> "op": "set_choose_tries", > >> "num": 100 > >> }, > >> { > >> "op": "take", > >> "item": -2, > >> "item_name": "default~hdd" > >> }, > >> { > >> "op": "chooseleaf_indep", > >> "num": 0, > >> "type": "host" > >> }, > >> { > >> "op": "emit" > >> } > >> ] > >> }, > >> > >> and I still get warning messages. > >> > >> Is there a way I can check if a particular "root" is used somewhere > other > >> than go thorough ceph osd pool ls detail and look into crush rule? > >> > >> Can I somehow delete "old" root default? > >> > >> Would it be safe to change pg manually even with overlapped roots? > >> > >> Rok > >> > >> > >> On Wed, Jan 25, 2023 at 12:03 PM Massimo Sgaravatto < > >> massimo.sgaravatto@xxxxxxxxx> wrote: > >> > >>> I tried the following on a small testbed first: > >>> > >>> ceph osd erasure-code-profile set profile-4-2-hdd k=4 m=2 > >>> crush-failure-domain=host crush-device-class=hdd > >>> ceph osd crush rule create-erasure ecrule-4-2-hdd profile-4-2-hdd > >>> ceph osd pool set ecpool-4-2 crush_rule ecrule-4-2-hdd > >>> > >>> and indeed after having applied this change for all the EC pools, the > >>> autoscaler doesn't complain anymore > >>> > >>> Thanks a lot ! > >>> > >>> Cheers, Massimo > >>> > >>> On Tue, Jan 24, 2023 at 7:02 PM Eugen Block <eblock@xxxxxx> wrote: > >>> > >>> > Hi, > >>> > > >>> > what you can’t change with EC pools is the EC profile, the pool‘s > >>> > ruleset you can change. The fix is the same as for the replicates > >>> > pools, assign a ruleset with hdd class and after some data movement > >>> > the autoscaler should not complain anymore. > >>> > > >>> > Regards > >>> > Eugen > >>> > > >>> > Zitat von Massimo Sgaravatto <massimo.sgaravatto@xxxxxxxxx>: > >>> > > >>> > > Dear all > >>> > > > >>> > > I have just changed the crush rule for all the replicated pools in > the > >>> > > following way: > >>> > > > >>> > > ceph osd crush rule create-replicated replicated_hdd default host > hdd > >>> > > ceph osd pool set <poolname> crush_rule replicated_hdd > >>> > > > >>> > > See also this [*] thread > >>> > > Before applying this change, these pools were all using > >>> > > the replicated_ruleset rule where the class is not specified. > >>> > > > >>> > > > >>> > > > >>> > > I am noticing now a problem with the autoscaler: "ceph osd pool > >>> > > autoscale-status" doesn't report any output and the mgr log > complains > >>> > about > >>> > > overlapping roots: > >>> > > > >>> > > [pg_autoscaler ERROR root] pool xyz has overlapping roots: {-18, > -1} > >>> > > > >>> > > > >>> > > Indeed: > >>> > > > >>> > > # ceph osd crush tree --show-shadow > >>> > > ID CLASS WEIGHT TYPE NAME > >>> > > -18 hdd 1329.26501 root default~hdd > >>> > > -17 hdd 329.14154 rack Rack11-PianoAlto~hdd > >>> > > -15 hdd 54.56085 host ceph-osd-04~hdd > >>> > > 30 hdd 5.45609 osd.30 > >>> > > 31 hdd 5.45609 osd.31 > >>> > > ... > >>> > > ... > >>> > > -1 1329.26501 root default > >>> > > -7 329.14154 rack Rack11-PianoAlto > >>> > > -8 54.56085 host ceph-osd-04 > >>> > > 30 hdd 5.45609 osd.30 > >>> > > 31 hdd 5.45609 osd.31 > >>> > > ... > >>> > > > >>> > > I have already read about this behavior but I have no clear ideas > how > >>> to > >>> > > fix the problem. > >>> > > > >>> > > I read somewhere that the problem happens when there are rules that > >>> force > >>> > > some pools to only use one class and there are also pools which > does > >>> not > >>> > > make any distinction between device classes > >>> > > > >>> > > > >>> > > All the replicated pools are using the replicated_hdd pool but I > also > >>> > have > >>> > > some EC pools which are using a profile where the class is not > >>> specified. > >>> > > As far I understand, I can't force these pools to use only the hdd > >>> class: > >>> > > according to the doc I can't change this profile specifying the hdd > >>> class > >>> > > (or at least the change wouldn't be applied to the existing EC > pools) > >>> > > > >>> > > Any suggestions ? > >>> > > > >>> > > The crush map is available at > >>> https://cernbox.cern.ch/s/gIyjbQbmoTFHCrr, > >>> > if > >>> > > you want to have a look > >>> > > > >>> > > Many thanks, Massimo > >>> > > > >>> > > [*] https://www.mail-archive.com/ceph-users@xxxxxxx/msg18534.html > >>> > > _______________________________________________ > >>> > > ceph-users mailing list -- ceph-users@xxxxxxx > >>> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>> > > >>> > > >>> > _______________________________________________ > >>> > ceph-users mailing list -- ceph-users@xxxxxxx > >>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>> > > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users@xxxxxxx > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>> > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx