Re: Problems with autoscaler (overlapping roots) after changing the pool class

Massimo Sgaravatto <massimo.sgaravatto@xxxxxxxxx> · Wed, 25 Jan 2023 12:02:34 +0100

I tried the following on a small testbed first:

ceph osd erasure-code-profile set profile-4-2-hdd k=4 m=2
crush-failure-domain=host crush-device-class=hdd
ceph osd crush rule create-erasure ecrule-4-2-hdd profile-4-2-hdd
ceph osd pool set ecpool-4-2 crush_rule ecrule-4-2-hdd

and indeed after having applied this change for all the EC pools, the
autoscaler doesn't complain anymore

Thanks a lot !

Cheers, Massimo

On Tue, Jan 24, 2023 at 7:02 PM Eugen Block <eblock@xxxxxx> wrote:

> Hi,
>
> what you can’t change with EC pools is the EC profile, the pool‘s
> ruleset you can change. The fix is the same as for the replicates
> pools, assign a ruleset with hdd class and after some data movement
> the autoscaler should not complain anymore.
>
> Regards
> Eugen
>
> Zitat von Massimo Sgaravatto <massimo.sgaravatto@xxxxxxxxx>:
>
> > Dear all
> >
> > I have just changed the crush rule for all the replicated pools in the
> > following way:
> >
> > ceph osd crush rule create-replicated replicated_hdd default host hdd
> > ceph osd pool set  <poolname> crush_rule replicated_hdd
> >
> > See also this [*] thread
> > Before applying this change, these pools were all using
> > the replicated_ruleset rule where the class is not specified.
> >
> >
> >
> > I am noticing now a problem with the autoscaler: "ceph osd pool
> > autoscale-status" doesn't report any output and the mgr log complains
> about
> > overlapping roots:
> >
> >  [pg_autoscaler ERROR root] pool xyz has overlapping roots: {-18, -1}
> >
> >
> > Indeed:
> >
> > # ceph osd crush tree --show-shadow
> > ID   CLASS  WEIGHT      TYPE NAME
> > -18    hdd  1329.26501  root default~hdd
> > -17    hdd   329.14154      rack Rack11-PianoAlto~hdd
> > -15    hdd    54.56085          host ceph-osd-04~hdd
> >  30    hdd     5.45609              osd.30
> >  31    hdd     5.45609              osd.31
> > ...
> > ...
> >  -1         1329.26501  root default
> >  -7          329.14154      rack Rack11-PianoAlto
> >  -8           54.56085          host ceph-osd-04
> >  30    hdd     5.45609              osd.30
> >  31    hdd     5.45609              osd.31
> > ...
> >
> > I have already read about this behavior but  I have no clear ideas how to
> > fix the problem.
> >
> > I read somewhere that the problem happens when there are rules that force
> > some pools to only use one class and there are also pools which does not
> > make any distinction between device classes
> >
> >
> > All the replicated pools are using the replicated_hdd pool but I also
> have
> > some EC pools which are using a profile where the class is not specified.
> > As far I understand, I can't force these pools to use only the hdd class:
> > according to the doc I can't change this profile specifying the hdd class
> > (or at least the change wouldn't be applied to the existing EC pools)
> >
> > Any suggestions ?
> >
> > The crush map is available at https://cernbox.cern.ch/s/gIyjbQbmoTFHCrr,
> if
> > you want to have a look
> >
> > Many thanks, Massimo
> >
> > [*] https://www.mail-archive.com/ceph-users@xxxxxxx/msg18534.html
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx