MANY_OBJECT_PER_PG on 1 pool which is cephfs_metadata

Edouard FAZENDA <e.fazenda@xxxxxxx> · Fri, 8 Mar 2024 10:04:24 +0000

Hello Ceph community,

I have since this morning a warning about MANY_OBJECT_PER_PG on 1 pool which is cephfs_metadata

# ceph health detail
HEALTH_WARN 1 pools have many more objects per pg than average
[WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than average
    pool cephfs_metadata objects per pg (154151) is more than 10.0215 times cluster average (15382)

I have the autoscaling on on all the pool : 

# ceph osd pool autoscale-status
POOL                          SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE
device_health_metrics        9523k                3.0        26827G  0.0000                                  1.0       1              on
cephfs_data                  5389G                2.0        26827G  0.4018                                  1.0     512              on
cephfs_metadata             19365M                2.0        26827G  0.0014                                  4.0      16              on
.rgw.root                    1323                 3.0        26827G  0.0000                                  1.0      32              on
default.rgw.log             23552                 3.0        26827G  0.0000                                  1.0      32              on
default.rgw.control             0                 3.0        26827G  0.0000                                  1.0      32              on
default.rgw.meta            11911                 3.0        26827G  0.0000                                  4.0       8              on
default.rgw.buckets.index       0                 3.0        26827G  0.0000                                  4.0       8              on
default.rgw.buckets.data    497.0G                3.0        26827G  0.0556                                  1.0      32              on
kubernetes                  177.2G                2.0        26827G  0.0132                                  1.0      32              on
default.rgw.buckets.non-ec    432                 3.0        26827G  0.0000                                  1.0      32              on

Actually the pg_num is 16 for the cephfs_metdata pool , but it does not define NEW_PG_NUM 

Here the replicated size of all my pool 

# ceph osd dump | grep  'replicated size'
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 189372 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 10 'cephfs_data' replicated size 2 min_size 1 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512 autoscale_mode on last_change 189346 lfor 0/0/183690 flags hashpspool,selfmanaged_snaps stripe_width 0 application cephfs
pool 11 'cephfs_metadata' replicated size 2 min_size 1 crush_rule 1 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 187861 lfor 0/187861/187859 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
pool 18 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 5265 flags hashpspool stripe_width 0 application rgw
pool 19 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 5267 flags hashpspool stripe_width 0 application rgw
pool 20 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 5269 flags hashpspool stripe_width 0 application rgw
pool 21 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 5398 lfor 0/5398/5396 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw
pool 22 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 7491 lfor 0/7491/7489 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw
pool 23 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 7500 flags hashpspool stripe_width 0 application rgw
pool 24 'kubernetes' replicated size 2 min_size 1 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 189363 lfor 0/0/7560 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 25 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 23983 flags hashpspool stripe_width 0 application rgw

Why the autoscaler is not acting to increase the pg_num of the pool in warning ? 

As the pgcalc is not available on ceph now , do you thing it is a good idea to increase manually the pg_num of the cephfs_metadata , but which value should I set ? 

I have 18 OSD 

Thanks for the help 

Best Regards, 

Edouard FAZENDA
Technical Support

Chemin du Curé-Desclouds 2, CH-1226 THONEX  +41 (0)22 869 04 40

www.csti.ch

Attachment:
smime.p7s

Description: S/MIME cryptographic signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx