Hi,
how did you end up with that many PGs per OSD? According to your
output the pg_autoscaler is enabled, if that was done by the
autoscaler I would create a tracker issue for that. Then I would
either disable it or set the mode to "warn" and then reduce the pg_num
for some of the pools.
What does your crush rule 2 look like? Can you share the dump of the
rule with the ID 2?
ceph osd crush rule ls
ceph osd crush rule dump <NAME>
Zitat von farhad kh <farhad.khedriyan@xxxxxxxxx>:
hi
i have a problem in my cluster
i used cache tier for rgw data
In this way, three hosts for cache and three hosts for data I have used
SSDs for cache and HDD for data
i set 20 GiB quota for cache pool
when one host of cache tier shulde be offline
released this warning and i decreased quota to 10 GiB but it not resolved
and in dashboard not correct number of pg status ( 1 active+undersize)
what happening in my cluster ?
why this is not resolved?
anyone can explain this situation?
------------------------------------------------
##ceph -s
opcpmfpsksa0101: Mon May 30 12:05:12 2022
cluster:
id: 54d2b1d6-207e-11ec-8c73-005056ac51bf
health: HEALTH_WARN
1 hosts fail cephadm check
1 pools have many more objects per pg than average
Degraded data redundancy: 1750/53232 objects degraded (3.287%),
1 pg degraded, 1 pg undersized
too many PGs per OSD (259 > max 250)
services:
mon: 3 daemons, quorum opcpmfpsksa0101,opcpmfpsksa0103,opcpmfpsksa0105
(age 3d)
mgr: opcpmfpsksa0101.apmwdm(active, since 5h)
osd: 12 osds: 10 up (since 95m), 10 in (since 85m)
rgw: 2 daemons active (2 hosts, 1 zones)
data:
pools: 9 pools, 865 pgs
objects: 17.74k objects, 41 GiB
usage: 128 GiB used, 212 GiB / 340 GiB avail
pgs: 1750/53232 objects degraded (3.287%)
864 active+clean
1 active+undersized+degraded
-----------------------------
## ceph health detail
HEALTH_WARN 1 hosts fail cephadm check; 1 pools have many more objects per
pg than average; Degraded data redundancy: 1665/56910 objects degraded
(2.926%), 1 pg degraded, 1 pg undersized; too many PGs per OSD (259 > max
250)
[WRN] CEPHADM_HOST_CHECK_FAILED: 1 hosts fail cephadm check
host opcpcfpsksa0101 (10.56.12.210) failed check: Failed to connect to
opcpcfpsksa0101 (10.56.12.210).
Please make sure that the host is reachable and accepts connections using
the cephadm SSH key
To add the cephadm SSH key to the host:
ceph cephadm get-pub-key > ~/ceph.pub
ssh-copy-id -f -i ~/ceph.pub root@10.56.12.210
To check that the host is reachable open a new shell with the --no-hosts
flag:
cephadm shell --no-hosts
Then run the following:
ceph cephadm get-ssh-config > ssh_config
ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
chmod 0600 ~/cephadm_private_key
ssh -F ssh_config -i ~/cephadm_private_key root@10.56.12.210
[WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than
average
pool cache-pool objects per pg (1665) is more than 79.2857 times
cluster average (21)
[WRN] PG_DEGRADED: Degraded data redundancy: 1665/56910 objects degraded
(2.926%), 1 pg degraded, 1 pg undersized
pg 9.0 is stuck undersized for 88m, current state
active+undersized+degraded, last acting [10,11]
[WRN] TOO_MANY_PGS: too many PGs per OSD (259 > max 250)
--------------------------------------------------
ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META
AVAIL %USE VAR PGS STATUS TYPE NAME
-1 0.35156 - 340 GiB 128 GiB 121 GiB 12 MiB 6.9 GiB
212 GiB 37.58 1.00 - root default
-3 0.01959 - 0 B 0 B 0 B 0 B 0 B
0 B 0 0 - host opcpcfpsksa0101
0 ssd 0.00980 0 0 B 0 B 0 B 0 B 0 B
0 B 0 0 0 down osd.0
9 ssd 0.00980 0 0 B 0 B 0 B 0 B 0 B
0 B 0 0 0 down osd.9
-5 0.01959 - 20 GiB 5.1 GiB 4.0 GiB 588 KiB 1.1 GiB
15 GiB 25.29 0.67 - host opcpcfpsksa0103
7 ssd 0.00980 0.85004 10 GiB 483 MiB 75 MiB 539 KiB 407 MiB
9.5 GiB 4.72 0.13 3 up osd.7
10 ssd 0.00980 0.55011 10 GiB 4.6 GiB 3.9 GiB 49 KiB 703 MiB
5.4 GiB 45.85 1.22 5 up osd.10
-16 0.01959 - 20 GiB 5.5 GiB 4.0 GiB 542 KiB 1.5 GiB
15 GiB 27.28 0.73 - host opcpcfpsksa0105
8 ssd 0.00980 0.70007 10 GiB 851 MiB 75 MiB 121 KiB 775 MiB
9.2 GiB 8.31 0.22 10 up osd.8
11 ssd 0.00980 0.45013 10 GiB 4.6 GiB 3.9 GiB 421 KiB 742 MiB
5.4 GiB 46.24 1.23 5 up osd.11
-10 0.09760 - 100 GiB 39 GiB 38 GiB 207 KiB 963 MiB
61 GiB 38.59 1.03 - host opcsdfpsksa0101
1 hdd 0.04880 1.00000 50 GiB 19 GiB 19 GiB 207 KiB 639 MiB
31 GiB 38.77 1.03 424 up osd.1
12 hdd 0.04880 1.00000 50 GiB 19 GiB 19 GiB 0 B 323 MiB
31 GiB 38.40 1.02 430 up osd.12
-13 0.09760 - 100 GiB 39 GiB 38 GiB 4.9 MiB 1.8 GiB
61 GiB 39.41 1.05 - host opcsdfpsksa0103
2 hdd 0.04880 1.00000 50 GiB 20 GiB 20 GiB 2.6 MiB 703 MiB
30 GiB 40.42 1.08 428 up osd.2
3 hdd 0.04880 1.00000 50 GiB 19 GiB 18 GiB 2.3 MiB 1.1 GiB
31 GiB 38.39 1.02 429 up osd.3
-19 0.09760 - 100 GiB 39 GiB 38 GiB 5.3 MiB 1.6 GiB
61 GiB 39.25 1.04 - host opcsdfpsksa0105
4 hdd 0.04880 1.00000 50 GiB 19 GiB 19 GiB 2.7 MiB 560 MiB
31 GiB 38.21 1.02 433 up osd.4
5 hdd 0.04880 1.00000 50 GiB 20 GiB 19 GiB 2.6 MiB 1.1 GiB
30 GiB 40.30 1.07 427 up osd.5
TOTAL 340 GiB 128 GiB 121 GiB 12 MiB 6.9 GiB
212 GiB 37.58
MIN/MAX VAR: 0.13/1.23 STDDEV: 13.71
--------------------------------
rados df
POOL_NAME USED OBJECTS CLONES COPIES
MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS
WR USED COMPR UNDER COMPR
.rgw.root 52 KiB 4 0 12
0 0 0 724 728 KiB 48 27 KiB 0 B
0 B
cache-pool 8.6 GiB 1750 0 5250
0 0 1750 449 265 MiB 4003 20 KiB 0 B
0 B
default.rgw.buckets.data 112 GiB 15498 0 46494
0 0 0 249164 48 GiB 1241400 545 GiB 0 B
0 B
default.rgw.buckets.index 6.6 MiB 11 0 33
0 0 0 82342 245 MiB 33831 17 MiB 0 B
0 B
default.rgw.buckets.non-ec 7.5 MiB 250 0 750
0 0 0 524383 320 MiB 149486 124 MiB 0 B
0 B
default.rgw.control 0 B 8 0 24
0 0 0 0 0 B 0 0 B 0 B
0 B
default.rgw.log 1.2 MiB 209 0 627
0 0 0 26272099 25 GiB 17227290 69 MiB 0 B
0 B
default.rgw.meta 121 KiB 14 0 42
0 0 0 3007 2.2 MiB 129 58 KiB 0 B
0 B
device_health_metrics 0 B 0 0 0
0 0 0 0 0 B 0 0 B 0 B
0 B
total_objects 17744
total_used 128 GiB
total_avail 212 GiB
total_space 340 GiB
---------------------------------------------------
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change
17591 lfor 0/14881/15285 flags hashpspool stripe_width 0 pg_num_min 1
application mgr_devicehealth
pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 1 object_hash
rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change 17835 lfor
0/3320/3318 flags hashpspool stripe_width 0 application rgw
pool 3 'default.rgw.log' replicated size 3 min_size 2 crush_rule 1
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
17840 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_rule 1
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
17847 flags hashpspool stripe_width 0 application rgw
pool 5 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 1
object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode on last_change
17855 lfor 0/15243/15287 flags hashpspool stripe_width 0 pg_autoscale_bias
4 pg_num_min 8 application rgw
pool 6 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule
1 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode on last_change
17590 lfor 0/15226/15287 flags hashpspool stripe_width 0 pg_autoscale_bias
4 pg_num_min 8 application rgw
pool 7 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 1
object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change
17826 lfor 17826/17826/17826 flags hashpspool tiers 9 read_tier 9
write_tier 9 stripe_width 0 application rgw
pool 8 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule
1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
17555 flags hashpspool stripe_width 0 application rgw
pool 9 'cache-pool' replicated size 3 min_size 2 crush_rule 2 object_hash
rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18019 lfor
17826/17826/17826 flags hashpspool,incomplete_clones max_bytes 10737418240
tier_of 7 cache_mode writeback target_bytes 7516192768 hit_set
bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 300s x8
decay_rate 0 search_last_n 0 stripe_width 0 application rgw
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx