Don't try to delete a root, that will definitely break something.
Instead, check the crush rules which don't use a device class and use
the reclassify of the crushtool to modify the rules. This will trigger
only a bit of data movement, but not as much as a simple change of the
rule would.
Zitat von Rok Jaklič <rjaklic@xxxxxxxxx>:
I got a similar problem after changing pool class to use only hdd following
https://www.spinics.net/lists/ceph-users/msg84987.html. Data migrated
successfully.
I get warnings like:
2024-12-23T14:39:37.103+0100 7f949edad640 0 [pg_autoscaler WARNING root]
pool default.rgw.buckets.index won't scale due to overlapping roots: {-1,
-18}
2024-12-23T14:39:37.105+0100 7f949edad640 0 [pg_autoscaler WARNING root]
pool default.rgw.buckets.data won't scale due to overlapping roots: {-2,
-1, -18}
2024-12-23T14:39:37.107+0100 7f949edad640 0 [pg_autoscaler WARNING root]
pool cephfs_metadata won't scale due to overlapping roots: {-2, -1, -18}
2024-12-23T14:39:37.111+0100 7f949edad640 0 [pg_autoscaler WARNING root]
pool 1 contains an overlapping root -1... skipping scaling
...
while crush tree with shadow shows:
-2 hdd 1043.93188 root default~hdd
-4 hdd 151.82336 host ctplosd1~hdd
0 hdd 5.45798 osd.0
1 hdd 5.45798 osd.1
2 hdd 5.45798 osd.2
3 hdd 5.45798 osd.3
4 hdd 5.45798 osd.4
...
-1 1050.48230 root default
-3 153.27872 host ctplosd1
0 hdd 5.45798 osd.0
1 hdd 5.45798 osd.1
2 hdd 5.45798 osd.2
3 hdd 5.45798 osd.3
4 hdd 5.45798 osd.4
...
and even though crush rule for example for
pool 9 'default.rgw.buckets.data' erasure profile ec-32-profile size 5
min_size 4 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512
autoscale_mode on last_change 320144 lfor 0/127784/214408 flags
hashpspool,ec_overwrites stripe_width 12288 application rgw
is set to:
{
"rule_id": 1,
"rule_name": "ec32",
"type": 3,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -2,
"item_name": "default~hdd"
},
{
"op": "chooseleaf_indep",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
and I still get warning messages.
Is there a way I can check if a particular "root" is used somewhere other
than go thorough ceph osd pool ls detail and look into crush rule?
Can I somehow delete "old" root default?
Would it be safe to change pg manually even with overlapped roots?
Rok
On Wed, Jan 25, 2023 at 12:03 PM Massimo Sgaravatto <
massimo.sgaravatto@xxxxxxxxx> wrote:
I tried the following on a small testbed first:
ceph osd erasure-code-profile set profile-4-2-hdd k=4 m=2
crush-failure-domain=host crush-device-class=hdd
ceph osd crush rule create-erasure ecrule-4-2-hdd profile-4-2-hdd
ceph osd pool set ecpool-4-2 crush_rule ecrule-4-2-hdd
and indeed after having applied this change for all the EC pools, the
autoscaler doesn't complain anymore
Thanks a lot !
Cheers, Massimo
On Tue, Jan 24, 2023 at 7:02 PM Eugen Block <eblock@xxxxxx> wrote:
> Hi,
>
> what you can’t change with EC pools is the EC profile, the pool‘s
> ruleset you can change. The fix is the same as for the replicates
> pools, assign a ruleset with hdd class and after some data movement
> the autoscaler should not complain anymore.
>
> Regards
> Eugen
>
> Zitat von Massimo Sgaravatto <massimo.sgaravatto@xxxxxxxxx>:
>
> > Dear all
> >
> > I have just changed the crush rule for all the replicated pools in the
> > following way:
> >
> > ceph osd crush rule create-replicated replicated_hdd default host hdd
> > ceph osd pool set <poolname> crush_rule replicated_hdd
> >
> > See also this [*] thread
> > Before applying this change, these pools were all using
> > the replicated_ruleset rule where the class is not specified.
> >
> >
> >
> > I am noticing now a problem with the autoscaler: "ceph osd pool
> > autoscale-status" doesn't report any output and the mgr log complains
> about
> > overlapping roots:
> >
> > [pg_autoscaler ERROR root] pool xyz has overlapping roots: {-18, -1}
> >
> >
> > Indeed:
> >
> > # ceph osd crush tree --show-shadow
> > ID CLASS WEIGHT TYPE NAME
> > -18 hdd 1329.26501 root default~hdd
> > -17 hdd 329.14154 rack Rack11-PianoAlto~hdd
> > -15 hdd 54.56085 host ceph-osd-04~hdd
> > 30 hdd 5.45609 osd.30
> > 31 hdd 5.45609 osd.31
> > ...
> > ...
> > -1 1329.26501 root default
> > -7 329.14154 rack Rack11-PianoAlto
> > -8 54.56085 host ceph-osd-04
> > 30 hdd 5.45609 osd.30
> > 31 hdd 5.45609 osd.31
> > ...
> >
> > I have already read about this behavior but I have no clear ideas how
to
> > fix the problem.
> >
> > I read somewhere that the problem happens when there are rules that
force
> > some pools to only use one class and there are also pools which does
not
> > make any distinction between device classes
> >
> >
> > All the replicated pools are using the replicated_hdd pool but I also
> have
> > some EC pools which are using a profile where the class is not
specified.
> > As far I understand, I can't force these pools to use only the hdd
class:
> > according to the doc I can't change this profile specifying the hdd
class
> > (or at least the change wouldn't be applied to the existing EC pools)
> >
> > Any suggestions ?
> >
> > The crush map is available at
https://cernbox.cern.ch/s/gIyjbQbmoTFHCrr,
> if
> > you want to have a look
> >
> > Many thanks, Massimo
> >
> > [*] https://www.mail-archive.com/ceph-users@xxxxxxx/msg18534.html
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx