hi i have a problem in my cluster i used cache tier for rgw data In this way, three hosts for cache and three hosts for data I have used SSDs for cache and HDD for data i set 20 GiB quota for cache pool when one host of cache tier shulde be offline released this warning and i decreased quota to 10 GiB but it not resolved and in dashboard not correct number of pg status ( 1 active+undersize) what happening in my cluster ? why this is not resolved? anyone can explain this situation? ------------------------------------------------ ##ceph -s opcpmfpsksa0101: Mon May 30 12:05:12 2022 cluster: id: 54d2b1d6-207e-11ec-8c73-005056ac51bf health: HEALTH_WARN 1 hosts fail cephadm check 1 pools have many more objects per pg than average Degraded data redundancy: 1750/53232 objects degraded (3.287%), 1 pg degraded, 1 pg undersized too many PGs per OSD (259 > max 250) services: mon: 3 daemons, quorum opcpmfpsksa0101,opcpmfpsksa0103,opcpmfpsksa0105 (age 3d) mgr: opcpmfpsksa0101.apmwdm(active, since 5h) osd: 12 osds: 10 up (since 95m), 10 in (since 85m) rgw: 2 daemons active (2 hosts, 1 zones) data: pools: 9 pools, 865 pgs objects: 17.74k objects, 41 GiB usage: 128 GiB used, 212 GiB / 340 GiB avail pgs: 1750/53232 objects degraded (3.287%) 864 active+clean 1 active+undersized+degraded ----------------------------- ## ceph health detail HEALTH_WARN 1 hosts fail cephadm check; 1 pools have many more objects per pg than average; Degraded data redundancy: 1665/56910 objects degraded (2.926%), 1 pg degraded, 1 pg undersized; too many PGs per OSD (259 > max 250) [WRN] CEPHADM_HOST_CHECK_FAILED: 1 hosts fail cephadm check host opcpcfpsksa0101 (10.56.12.210) failed check: Failed to connect to opcpcfpsksa0101 (10.56.12.210). Please make sure that the host is reachable and accepts connections using the cephadm SSH key To add the cephadm SSH key to the host: > ceph cephadm get-pub-key > ~/ceph.pub > ssh-copy-id -f -i ~/ceph.pub root@10.56.12.210 To check that the host is reachable open a new shell with the --no-hosts flag: > cephadm shell --no-hosts Then run the following: > ceph cephadm get-ssh-config > ssh_config > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key > chmod 0600 ~/cephadm_private_key > ssh -F ssh_config -i ~/cephadm_private_key root@10.56.12.210 [WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than average pool cache-pool objects per pg (1665) is more than 79.2857 times cluster average (21) [WRN] PG_DEGRADED: Degraded data redundancy: 1665/56910 objects degraded (2.926%), 1 pg degraded, 1 pg undersized pg 9.0 is stuck undersized for 88m, current state active+undersized+degraded, last acting [10,11] [WRN] TOO_MANY_PGS: too many PGs per OSD (259 > max 250) -------------------------------------------------- ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME -1 0.35156 - 340 GiB 128 GiB 121 GiB 12 MiB 6.9 GiB 212 GiB 37.58 1.00 - root default -3 0.01959 - 0 B 0 B 0 B 0 B 0 B 0 B 0 0 - host opcpcfpsksa0101 0 ssd 0.00980 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.0 9 ssd 0.00980 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.9 -5 0.01959 - 20 GiB 5.1 GiB 4.0 GiB 588 KiB 1.1 GiB 15 GiB 25.29 0.67 - host opcpcfpsksa0103 7 ssd 0.00980 0.85004 10 GiB 483 MiB 75 MiB 539 KiB 407 MiB 9.5 GiB 4.72 0.13 3 up osd.7 10 ssd 0.00980 0.55011 10 GiB 4.6 GiB 3.9 GiB 49 KiB 703 MiB 5.4 GiB 45.85 1.22 5 up osd.10 -16 0.01959 - 20 GiB 5.5 GiB 4.0 GiB 542 KiB 1.5 GiB 15 GiB 27.28 0.73 - host opcpcfpsksa0105 8 ssd 0.00980 0.70007 10 GiB 851 MiB 75 MiB 121 KiB 775 MiB 9.2 GiB 8.31 0.22 10 up osd.8 11 ssd 0.00980 0.45013 10 GiB 4.6 GiB 3.9 GiB 421 KiB 742 MiB 5.4 GiB 46.24 1.23 5 up osd.11 -10 0.09760 - 100 GiB 39 GiB 38 GiB 207 KiB 963 MiB 61 GiB 38.59 1.03 - host opcsdfpsksa0101 1 hdd 0.04880 1.00000 50 GiB 19 GiB 19 GiB 207 KiB 639 MiB 31 GiB 38.77 1.03 424 up osd.1 12 hdd 0.04880 1.00000 50 GiB 19 GiB 19 GiB 0 B 323 MiB 31 GiB 38.40 1.02 430 up osd.12 -13 0.09760 - 100 GiB 39 GiB 38 GiB 4.9 MiB 1.8 GiB 61 GiB 39.41 1.05 - host opcsdfpsksa0103 2 hdd 0.04880 1.00000 50 GiB 20 GiB 20 GiB 2.6 MiB 703 MiB 30 GiB 40.42 1.08 428 up osd.2 3 hdd 0.04880 1.00000 50 GiB 19 GiB 18 GiB 2.3 MiB 1.1 GiB 31 GiB 38.39 1.02 429 up osd.3 -19 0.09760 - 100 GiB 39 GiB 38 GiB 5.3 MiB 1.6 GiB 61 GiB 39.25 1.04 - host opcsdfpsksa0105 4 hdd 0.04880 1.00000 50 GiB 19 GiB 19 GiB 2.7 MiB 560 MiB 31 GiB 38.21 1.02 433 up osd.4 5 hdd 0.04880 1.00000 50 GiB 20 GiB 19 GiB 2.6 MiB 1.1 GiB 30 GiB 40.30 1.07 427 up osd.5 TOTAL 340 GiB 128 GiB 121 GiB 12 MiB 6.9 GiB 212 GiB 37.58 MIN/MAX VAR: 0.13/1.23 STDDEV: 13.71 -------------------------------- rados df POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR .rgw.root 52 KiB 4 0 12 0 0 0 724 728 KiB 48 27 KiB 0 B 0 B cache-pool 8.6 GiB 1750 0 5250 0 0 1750 449 265 MiB 4003 20 KiB 0 B 0 B default.rgw.buckets.data 112 GiB 15498 0 46494 0 0 0 249164 48 GiB 1241400 545 GiB 0 B 0 B default.rgw.buckets.index 6.6 MiB 11 0 33 0 0 0 82342 245 MiB 33831 17 MiB 0 B 0 B default.rgw.buckets.non-ec 7.5 MiB 250 0 750 0 0 0 524383 320 MiB 149486 124 MiB 0 B 0 B default.rgw.control 0 B 8 0 24 0 0 0 0 0 B 0 0 B 0 B 0 B default.rgw.log 1.2 MiB 209 0 627 0 0 0 26272099 25 GiB 17227290 69 MiB 0 B 0 B default.rgw.meta 121 KiB 14 0 42 0 0 0 3007 2.2 MiB 129 58 KiB 0 B 0 B device_health_metrics 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B total_objects 17744 total_used 128 GiB total_avail 212 GiB total_space 340 GiB --------------------------------------------------- pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change 17591 lfor 0/14881/15285 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change 17835 lfor 0/3320/3318 flags hashpspool stripe_width 0 application rgw pool 3 'default.rgw.log' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 17840 flags hashpspool stripe_width 0 application rgw pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 17847 flags hashpspool stripe_width 0 application rgw pool 5 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode on last_change 17855 lfor 0/15243/15287 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw pool 6 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode on last_change 17590 lfor 0/15226/15287 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw pool 7 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 17826 lfor 17826/17826/17826 flags hashpspool tiers 9 read_tier 9 write_tier 9 stripe_width 0 application rgw pool 8 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 17555 flags hashpspool stripe_width 0 application rgw pool 9 'cache-pool' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18019 lfor 17826/17826/17826 flags hashpspool,incomplete_clones max_bytes 10737418240 tier_of 7 cache_mode writeback target_bytes 7516192768 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 300s x8 decay_rate 0 search_last_n 0 stripe_width 0 application rgw _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx