Re: [ceph-users] Bug with autoscale-status in 17.2.0 ?

Maximilian Hill <max@xxxxxxxxxx> · Fri, 10 Jun 2022 14:23:42 +0200

Hi Jake,

this looks similar to something I ran into lately with version 16.2.7

In my case, the active managers log showed something about (non) 
overlapping roots, so I
adjusted the crush rules. For me it was quite simple, because the 
cluster was all SSDs,
but the default replicated_rule was never set to class ssd.

What I did, was setting the correct device class.

A little background on that:
  the default root usually has the id -1
  but when choosing device class ssd in the crush rule, CRUSH uses the 
root default~ssd
  this root has a different id. In my case it was -2

Possibly your pool `.mgr` uses a crush rule, which doesn't specify the 
device class but
uses some OSDs covered by another crush rule in use.

Best regards

On 6/10/22 1:14 PM, Jake Grimmett wrote:
Dear All,

We are testing Quincy on a new large cluster, "ceph osd pool 
autoscale-status" fails if we add a pool that uses a custom crush rule 
using a specific device class, but it's fine if we don't specify the 
class:

[root@wilma-s1 ~]# ceph -v
ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy 
(stable)

[root@wilma-s1 ~]# cat /etc/redhat-release
AlmaLinux release 8.6 (Sky Tiger)

[root@wilma-s1 ~]# ceph osd pool autoscale-status
POOL        SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET 
RATIO EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr       6980k                3.0         7200T  0.0000 
                  1.0       1              on         False
ec82pool      0                1.25         7200T  0.0000 
                  1.0    1024          32  off        False

[root@wilma-s1 ~]# ceph osd crush rule create-replicated 
ssd_replicated default host ssd
[root@wilma-s1 ~]# ceph osd pool create mds_ssd 32 32 ssd_replicated
pool 'mds_ssd' created
[root@wilma-s1 ~]# ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    7.0 PiB  6.9 PiB  126 TiB   126 TiB       1.75
ssd    2.7 TiB  2.7 TiB  3.1 GiB   3.1 GiB       0.11
TOTAL  7.0 PiB  6.9 PiB  126 TiB   126 TiB       1.75

--- POOLS ---
POOL      ID   PGS   STORED  OBJECTS    USED  %USED  MAX AVAIL
.mgr       4     1  6.8 MiB        3  20 MiB      0    2.2 PiB
ec82pool   8  1024      0 B        0     0 B      0    5.2 PiB
mds_ssd   13    32      0 B        0     0 B      0    884 GiB

[root@wilma-s1 ~]# ceph osd pool autoscale-status
(exits with no output)
[root@wilma-s1 ~]# ceph osd pool delete mds_ssd mds_ssd 
--yes-i-really-really-mean-it
pool 'mds_ssd' removed
[root@wilma-s1 ~]# ceph osd pool autoscale-status
POOL        SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET 
RATIO EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr       6980k                3.0         7200T  0.0000 
                  1.0       1              on         False
ec82pool      0                1.25         7200T  0.0000 
                  1.0    1024          32  off        False

Any ideas on what might be going on?

We get a similar problem if we specify hdd as the class.

best regards

Jake

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx