Re: Problems with autoscaler (overlapping roots) after changing the pool class

Rok Jaklič <rjaklic@xxxxxxxxx> · Mon, 23 Dec 2024 23:05:04 +0100

I will try changing/adding class ssd to replicated_rule tomorrow even
though I am a little hesitant for some reason to edit this rule since it
could mean that system data for rgw would "stay somewhere" if something
goes wrong. I was much braver when I was changing the rule for EC32 where I
separated OSD data to just hdd, since "some data" was already on hdd.

On Mon, Dec 23, 2024 at 4:12 PM Anthony D'Atri <anthony.datri@xxxxxxxxx>
wrote:

> Agreed.  The .mgr pool is a usual suspect here, especially when using
> Rook.  When any pool is constrained to a device class, this kind of warning
> will happen if *all* pools don’t specify one.
>
> Of course there’s also the strategy of disabling the autoscaler, but that
> takes more analysis.  We old farts are used to it, but it can be daunting
> for whippersnappers.
>
> > On Dec 23, 2024, at 9:11 AM, Eugen Block <eblock@xxxxxx> wrote:
> >
> > Don't try to delete a root, that will definitely break something.
> Instead, check the crush rules which don't use a device class and use the
> reclassify of the crushtool to modify the rules. This will trigger only a
> bit of data movement, but not as much as a simple change of the rule would.
> >
> > Zitat von Rok Jaklič <rjaklic@xxxxxxxxx>:
> >
> >> I got a similar problem after changing pool class to use only hdd
> following
> >> https://www.spinics.net/lists/ceph-users/msg84987.html. Data migrated
> >> successfully.
> >>
> >> I get warnings like:
> >> 2024-12-23T14:39:37.103+0100 7f949edad640  0 [pg_autoscaler WARNING
> root]
> >> pool default.rgw.buckets.index won't scale due to overlapping roots:
> {-1,
> >> -18}
> >> 2024-12-23T14:39:37.105+0100 7f949edad640  0 [pg_autoscaler WARNING
> root]
> >> pool default.rgw.buckets.data won't scale due to overlapping roots: {-2,
> >> -1, -18}
> >> 2024-12-23T14:39:37.107+0100 7f949edad640  0 [pg_autoscaler WARNING
> root]
> >> pool cephfs_metadata won't scale due to overlapping roots: {-2, -1, -18}
> >> 2024-12-23T14:39:37.111+0100 7f949edad640  0 [pg_autoscaler WARNING
> root]
> >> pool 1 contains an overlapping root -1... skipping scaling
> >> ...
> >>
> >> while crush tree with shadow shows:
> >> -2    hdd  1043.93188  root default~hdd
> >> -4    hdd   151.82336      host ctplosd1~hdd
> >>  0    hdd     5.45798          osd.0
> >>  1    hdd     5.45798          osd.1
> >>  2    hdd     5.45798          osd.2
> >>  3    hdd     5.45798          osd.3
> >>  4    hdd     5.45798          osd.4
> >> ...
> >> -1         1050.48230  root default
> >> -3          153.27872      host ctplosd1
> >>  0    hdd     5.45798          osd.0
> >>  1    hdd     5.45798          osd.1
> >>  2    hdd     5.45798          osd.2
> >>  3    hdd     5.45798          osd.3
> >>  4    hdd     5.45798          osd.4
> >> ...
> >>
> >> and even though crush rule for example for
> >>
> >> pool 9 'default.rgw.buckets.data' erasure profile ec-32-profile size 5
> >> min_size 4 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512
> >> autoscale_mode on last_change 320144 lfor 0/127784/214408 flags
> >> hashpspool,ec_overwrites stripe_width 12288 application rgw
> >>
> >> is set to:
> >>        {
> >>            "rule_id": 1,
> >>            "rule_name": "ec32",
> >>            "type": 3,
> >>            "steps": [
> >>                {
> >>                    "op": "set_chooseleaf_tries",
> >>                    "num": 5
> >>                },
> >>                {
> >>                    "op": "set_choose_tries",
> >>                    "num": 100
> >>                },
> >>                {
> >>                    "op": "take",
> >>                    "item": -2,
> >>                    "item_name": "default~hdd"
> >>                },
> >>                {
> >>                    "op": "chooseleaf_indep",
> >>                    "num": 0,
> >>                    "type": "host"
> >>                },
> >>                {
> >>                    "op": "emit"
> >>                }
> >>            ]
> >>        },
> >>
> >> and I still get warning messages.
> >>
> >> Is there a way I can check if a particular "root" is used somewhere
> other
> >> than go thorough ceph osd pool ls detail and look into crush rule?
> >>
> >> Can I somehow delete "old" root default?
> >>
> >> Would it be safe to change pg manually even with overlapped roots?
> >>
> >> Rok
> >>
> >>
> >> On Wed, Jan 25, 2023 at 12:03 PM Massimo Sgaravatto <
> >> massimo.sgaravatto@xxxxxxxxx> wrote:
> >>
> >>> I tried the following on a small testbed first:
> >>>
> >>> ceph osd erasure-code-profile set profile-4-2-hdd k=4 m=2
> >>> crush-failure-domain=host crush-device-class=hdd
> >>> ceph osd crush rule create-erasure ecrule-4-2-hdd profile-4-2-hdd
> >>> ceph osd pool set ecpool-4-2 crush_rule ecrule-4-2-hdd
> >>>
> >>> and indeed after having applied this change for all the EC pools, the
> >>> autoscaler doesn't complain anymore
> >>>
> >>> Thanks a lot !
> >>>
> >>> Cheers, Massimo
> >>>
> >>> On Tue, Jan 24, 2023 at 7:02 PM Eugen Block <eblock@xxxxxx> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> > what you can’t change with EC pools is the EC profile, the pool‘s
> >>> > ruleset you can change. The fix is the same as for the replicates
> >>> > pools, assign a ruleset with hdd class and after some data movement
> >>> > the autoscaler should not complain anymore.
> >>> >
> >>> > Regards
> >>> > Eugen
> >>> >
> >>> > Zitat von Massimo Sgaravatto <massimo.sgaravatto@xxxxxxxxx>:
> >>> >
> >>> > > Dear all
> >>> > >
> >>> > > I have just changed the crush rule for all the replicated pools in
> the
> >>> > > following way:
> >>> > >
> >>> > > ceph osd crush rule create-replicated replicated_hdd default host
> hdd
> >>> > > ceph osd pool set  <poolname> crush_rule replicated_hdd
> >>> > >
> >>> > > See also this [*] thread
> >>> > > Before applying this change, these pools were all using
> >>> > > the replicated_ruleset rule where the class is not specified.
> >>> > >
> >>> > >
> >>> > >
> >>> > > I am noticing now a problem with the autoscaler: "ceph osd pool
> >>> > > autoscale-status" doesn't report any output and the mgr log
> complains
> >>> > about
> >>> > > overlapping roots:
> >>> > >
> >>> > >  [pg_autoscaler ERROR root] pool xyz has overlapping roots: {-18,
> -1}
> >>> > >
> >>> > >
> >>> > > Indeed:
> >>> > >
> >>> > > # ceph osd crush tree --show-shadow
> >>> > > ID   CLASS  WEIGHT      TYPE NAME
> >>> > > -18    hdd  1329.26501  root default~hdd
> >>> > > -17    hdd   329.14154      rack Rack11-PianoAlto~hdd
> >>> > > -15    hdd    54.56085          host ceph-osd-04~hdd
> >>> > >  30    hdd     5.45609              osd.30
> >>> > >  31    hdd     5.45609              osd.31
> >>> > > ...
> >>> > > ...
> >>> > >  -1         1329.26501  root default
> >>> > >  -7          329.14154      rack Rack11-PianoAlto
> >>> > >  -8           54.56085          host ceph-osd-04
> >>> > >  30    hdd     5.45609              osd.30
> >>> > >  31    hdd     5.45609              osd.31
> >>> > > ...
> >>> > >
> >>> > > I have already read about this behavior but  I have no clear ideas
> how
> >>> to
> >>> > > fix the problem.
> >>> > >
> >>> > > I read somewhere that the problem happens when there are rules that
> >>> force
> >>> > > some pools to only use one class and there are also pools which
> does
> >>> not
> >>> > > make any distinction between device classes
> >>> > >
> >>> > >
> >>> > > All the replicated pools are using the replicated_hdd pool but I
> also
> >>> > have
> >>> > > some EC pools which are using a profile where the class is not
> >>> specified.
> >>> > > As far I understand, I can't force these pools to use only the hdd
> >>> class:
> >>> > > according to the doc I can't change this profile specifying the hdd
> >>> class
> >>> > > (or at least the change wouldn't be applied to the existing EC
> pools)
> >>> > >
> >>> > > Any suggestions ?
> >>> > >
> >>> > > The crush map is available at
> >>> https://cernbox.cern.ch/s/gIyjbQbmoTFHCrr,
> >>> > if
> >>> > > you want to have a look
> >>> > >
> >>> > > Many thanks, Massimo
> >>> > >
> >>> > > [*] https://www.mail-archive.com/ceph-users@xxxxxxx/msg18534.html
> >>> > > _______________________________________________
> >>> > > ceph-users mailing list -- ceph-users@xxxxxxx
> >>> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>> >
> >>> >
> >>> > _______________________________________________
> >>> > ceph-users mailing list -- ceph-users@xxxxxxx
> >>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>> >
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx