Re: Problems with autoscaler (overlapping roots) after changing the pool class

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Mon, 23 Dec 2024 20:16:08 -0500

If your NVMe OSDs have the `ssd` device class, doing what you suggest might not even result in any data movement.

https://docs.ceph.com/en/reef/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes 
This page shows how to use the reclassify feature to help avoid typos when editing the CRUSHmap.  Using a CLI tool when feasible makes this sort of thing a lot safer, compared to back in the day when we had to text-edit everything by hand :nailbiting:.  One can readily diff the before and after decompiled text CRUSHmaps to ensure sanity before recompiling and injecting.

I’ve done this myself multiple times since device classes became a thing.

> On Dec 23, 2024, at 5:05 PM, Rok Jaklič <rjaklic@xxxxxxxxx> wrote:
> 
> I will try changing/adding class ssd to replicated_rule tomorrow even
> though I am a little hesitant for some reason to edit this rule since it
> could mean that system data for rgw would "stay somewhere" if something
> goes wrong. I was much braver when I was changing the rule for EC32 where I
> separated OSD data to just hdd, since "some data" was already on hdd.
> 
> 
> On Mon, Dec 23, 2024 at 4:12 PM Anthony D'Atri <anthony.datri@xxxxxxxxx>
> wrote:
> 
>> Agreed.  The .mgr pool is a usual suspect here, especially when using
>> Rook.  When any pool is constrained to a device class, this kind of warning
>> will happen if *all* pools don’t specify one.
>> 
>> Of course there’s also the strategy of disabling the autoscaler, but that
>> takes more analysis.  We old farts are used to it, but it can be daunting
>> for whippersnappers.
>> 
>>> On Dec 23, 2024, at 9:11 AM, Eugen Block <eblock@xxxxxx> wrote:
>>> 
>>> Don't try to delete a root, that will definitely break something.
>> Instead, check the crush rules which don't use a device class and use the
>> reclassify of the crushtool to modify the rules. This will trigger only a
>> bit of data movement, but not as much as a simple change of the rule would.
>>> 
>>> Zitat von Rok Jaklič <rjaklic@xxxxxxxxx>:
>>> 
>>>> I got a similar problem after changing pool class to use only hdd
>> following
>>>> https://www.spinics.net/lists/ceph-users/msg84987.html. Data migrated
>>>> successfully.
>>>> 
>>>> I get warnings like:
>>>> 2024-12-23T14:39:37.103+0100 7f949edad640  0 [pg_autoscaler WARNING
>> root]
>>>> pool default.rgw.buckets.index won't scale due to overlapping roots:
>> {-1,
>>>> -18}
>>>> 2024-12-23T14:39:37.105+0100 7f949edad640  0 [pg_autoscaler WARNING
>> root]
>>>> pool default.rgw.buckets.data won't scale due to overlapping roots: {-2,
>>>> -1, -18}
>>>> 2024-12-23T14:39:37.107+0100 7f949edad640  0 [pg_autoscaler WARNING
>> root]
>>>> pool cephfs_metadata won't scale due to overlapping roots: {-2, -1, -18}
>>>> 2024-12-23T14:39:37.111+0100 7f949edad640  0 [pg_autoscaler WARNING
>> root]
>>>> pool 1 contains an overlapping root -1... skipping scaling
>>>> ...
>>>> 
>>>> while crush tree with shadow shows:
>>>> -2    hdd  1043.93188  root default~hdd
>>>> -4    hdd   151.82336      host ctplosd1~hdd
>>>> 0    hdd     5.45798          osd.0
>>>> 1    hdd     5.45798          osd.1
>>>> 2    hdd     5.45798          osd.2
>>>> 3    hdd     5.45798          osd.3
>>>> 4    hdd     5.45798          osd.4
>>>> ...
>>>> -1         1050.48230  root default
>>>> -3          153.27872      host ctplosd1
>>>> 0    hdd     5.45798          osd.0
>>>> 1    hdd     5.45798          osd.1
>>>> 2    hdd     5.45798          osd.2
>>>> 3    hdd     5.45798          osd.3
>>>> 4    hdd     5.45798          osd.4
>>>> ...
>>>> 
>>>> and even though crush rule for example for
>>>> 
>>>> pool 9 'default.rgw.buckets.data' erasure profile ec-32-profile size 5
>>>> min_size 4 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512
>>>> autoscale_mode on last_change 320144 lfor 0/127784/214408 flags
>>>> hashpspool,ec_overwrites stripe_width 12288 application rgw
>>>> 
>>>> is set to:
>>>>       {
>>>>           "rule_id": 1,
>>>>           "rule_name": "ec32",
>>>>           "type": 3,
>>>>           "steps": [
>>>>               {
>>>>                   "op": "set_chooseleaf_tries",
>>>>                   "num": 5
>>>>               },
>>>>               {
>>>>                   "op": "set_choose_tries",
>>>>                   "num": 100
>>>>               },
>>>>               {
>>>>                   "op": "take",
>>>>                   "item": -2,
>>>>                   "item_name": "default~hdd"
>>>>               },
>>>>               {
>>>>                   "op": "chooseleaf_indep",
>>>>                   "num": 0,
>>>>                   "type": "host"
>>>>               },
>>>>               {
>>>>                   "op": "emit"
>>>>               }
>>>>           ]
>>>>       },
>>>> 
>>>> and I still get warning messages.
>>>> 
>>>> Is there a way I can check if a particular "root" is used somewhere
>> other
>>>> than go thorough ceph osd pool ls detail and look into crush rule?
>>>> 
>>>> Can I somehow delete "old" root default?
>>>> 
>>>> Would it be safe to change pg manually even with overlapped roots?
>>>> 
>>>> Rok
>>>> 
>>>> 
>>>> On Wed, Jan 25, 2023 at 12:03 PM Massimo Sgaravatto <
>>>> massimo.sgaravatto@xxxxxxxxx> wrote:
>>>> 
>>>>> I tried the following on a small testbed first:
>>>>> 
>>>>> ceph osd erasure-code-profile set profile-4-2-hdd k=4 m=2
>>>>> crush-failure-domain=host crush-device-class=hdd
>>>>> ceph osd crush rule create-erasure ecrule-4-2-hdd profile-4-2-hdd
>>>>> ceph osd pool set ecpool-4-2 crush_rule ecrule-4-2-hdd
>>>>> 
>>>>> and indeed after having applied this change for all the EC pools, the
>>>>> autoscaler doesn't complain anymore
>>>>> 
>>>>> Thanks a lot !
>>>>> 
>>>>> Cheers, Massimo
>>>>> 
>>>>> On Tue, Jan 24, 2023 at 7:02 PM Eugen Block <eblock@xxxxxx> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> what you can’t change with EC pools is the EC profile, the pool‘s
>>>>>> ruleset you can change. The fix is the same as for the replicates
>>>>>> pools, assign a ruleset with hdd class and after some data movement
>>>>>> the autoscaler should not complain anymore.
>>>>>> 
>>>>>> Regards
>>>>>> Eugen
>>>>>> 
>>>>>> Zitat von Massimo Sgaravatto <massimo.sgaravatto@xxxxxxxxx>:
>>>>>> 
>>>>>>> Dear all
>>>>>>> 
>>>>>>> I have just changed the crush rule for all the replicated pools in
>> the
>>>>>>> following way:
>>>>>>> 
>>>>>>> ceph osd crush rule create-replicated replicated_hdd default host
>> hdd
>>>>>>> ceph osd pool set  <poolname> crush_rule replicated_hdd
>>>>>>> 
>>>>>>> See also this [*] thread
>>>>>>> Before applying this change, these pools were all using
>>>>>>> the replicated_ruleset rule where the class is not specified.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I am noticing now a problem with the autoscaler: "ceph osd pool
>>>>>>> autoscale-status" doesn't report any output and the mgr log
>> complains
>>>>>> about
>>>>>>> overlapping roots:
>>>>>>> 
>>>>>>> [pg_autoscaler ERROR root] pool xyz has overlapping roots: {-18,
>> -1}
>>>>>>> 
>>>>>>> 
>>>>>>> Indeed:
>>>>>>> 
>>>>>>> # ceph osd crush tree --show-shadow
>>>>>>> ID   CLASS  WEIGHT      TYPE NAME
>>>>>>> -18    hdd  1329.26501  root default~hdd
>>>>>>> -17    hdd   329.14154      rack Rack11-PianoAlto~hdd
>>>>>>> -15    hdd    54.56085          host ceph-osd-04~hdd
>>>>>>> 30    hdd     5.45609              osd.30
>>>>>>> 31    hdd     5.45609              osd.31
>>>>>>> ...
>>>>>>> ...
>>>>>>> -1         1329.26501  root default
>>>>>>> -7          329.14154      rack Rack11-PianoAlto
>>>>>>> -8           54.56085          host ceph-osd-04
>>>>>>> 30    hdd     5.45609              osd.30
>>>>>>> 31    hdd     5.45609              osd.31
>>>>>>> ...
>>>>>>> 
>>>>>>> I have already read about this behavior but  I have no clear ideas
>> how
>>>>> to
>>>>>>> fix the problem.
>>>>>>> 
>>>>>>> I read somewhere that the problem happens when there are rules that
>>>>> force
>>>>>>> some pools to only use one class and there are also pools which
>> does
>>>>> not
>>>>>>> make any distinction between device classes
>>>>>>> 
>>>>>>> 
>>>>>>> All the replicated pools are using the replicated_hdd pool but I
>> also
>>>>>> have
>>>>>>> some EC pools which are using a profile where the class is not
>>>>> specified.
>>>>>>> As far I understand, I can't force these pools to use only the hdd
>>>>> class:
>>>>>>> according to the doc I can't change this profile specifying the hdd
>>>>> class
>>>>>>> (or at least the change wouldn't be applied to the existing EC
>> pools)
>>>>>>> 
>>>>>>> Any suggestions ?
>>>>>>> 
>>>>>>> The crush map is available at
>>>>> https://cernbox.cern.ch/s/gIyjbQbmoTFHCrr,
>>>>>> if
>>>>>>> you want to have a look
>>>>>>> 
>>>>>>> Many thanks, Massimo
>>>>>>> 
>>>>>>> [*] https://www.mail-archive.com/ceph-users@xxxxxxx/msg18534.html
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>> 
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx