Degraded data redundancy and too many PGs per OSD

farhad kh <farhad.khedriyan@xxxxxxxxx> · Tue, 31 May 2022 08:55:41 +0430



hi
i have a problem in my cluster
i used cache tier for rgw data
In this way, three hosts for cache and three hosts for data I have used
SSDs for cache and HDD for data
i set 20 GiB quota for cache pool
when one host of cache tier shulde be offline
released this warning and i decreased quota to 10 GiB but it not resolved
and in dashboard not correct number of pg status ( 1 active+undersize)
what happening in my cluster ?
why this is not resolved?
anyone can explain this situation?
------------------------------------------------
##ceph -s
opcpmfpsksa0101: Mon May 30 12:05:12 2022

  cluster:
    id:     54d2b1d6-207e-11ec-8c73-005056ac51bf
    health: HEALTH_WARN
            1 hosts fail cephadm check
            1 pools have many more objects per pg than average
            Degraded data redundancy: 1750/53232 objects degraded (3.287%),
1 pg degraded, 1 pg undersized
            too many PGs per OSD (259 > max 250)

  services:
    mon: 3 daemons, quorum opcpmfpsksa0101,opcpmfpsksa0103,opcpmfpsksa0105
(age 3d)
    mgr: opcpmfpsksa0101.apmwdm(active, since 5h)
    osd: 12 osds: 10 up (since 95m), 10 in (since 85m)
    rgw: 2 daemons active (2 hosts, 1 zones)

  data:
    pools:   9 pools, 865 pgs
    objects: 17.74k objects, 41 GiB
    usage:   128 GiB used, 212 GiB / 340 GiB avail
    pgs:     1750/53232 objects degraded (3.287%)
             864 active+clean
             1   active+undersized+degraded

-----------------------------
## ceph health detail
HEALTH_WARN 1 hosts fail cephadm check; 1 pools have many more objects per
pg than average; Degraded data redundancy: 1665/56910 objects degraded
(2.926%), 1 pg degraded, 1 pg undersized; too many PGs per OSD (259 > max
250)
[WRN] CEPHADM_HOST_CHECK_FAILED: 1 hosts fail cephadm check
    host opcpcfpsksa0101 (10.56.12.210) failed check: Failed to connect to
opcpcfpsksa0101 (10.56.12.210).
Please make sure that the host is reachable and accepts connections using
the cephadm SSH key

To add the cephadm SSH key to the host:
> ceph cephadm get-pub-key > ~/ceph.pub
> ssh-copy-id -f -i ~/ceph.pub root@10.56.12.210

To check that the host is reachable open a new shell with the --no-hosts
flag:
> cephadm shell --no-hosts

Then run the following:
> ceph cephadm get-ssh-config > ssh_config
> ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
> chmod 0600 ~/cephadm_private_key
> ssh -F ssh_config -i ~/cephadm_private_key root@10.56.12.210
[WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than
average
    pool cache-pool objects per pg (1665) is more than 79.2857 times
cluster average (21)
[WRN] PG_DEGRADED: Degraded data redundancy: 1665/56910 objects degraded
(2.926%), 1 pg degraded, 1 pg undersized
    pg 9.0 is stuck undersized for 88m, current state
active+undersized+degraded, last acting [10,11]
[WRN] TOO_MANY_PGS: too many PGs per OSD (259 > max 250)
--------------------------------------------------
ceph osd df tree
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META
AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
 -1         0.35156         -  340 GiB  128 GiB  121 GiB   12 MiB  6.9 GiB
 212 GiB  37.58  1.00    -          root default
 -3         0.01959         -      0 B      0 B      0 B      0 B      0 B
     0 B      0     0    -              host opcpcfpsksa0101
  0    ssd  0.00980         0      0 B      0 B      0 B      0 B      0 B
     0 B      0     0    0    down          osd.0
  9    ssd  0.00980         0      0 B      0 B      0 B      0 B      0 B
     0 B      0     0    0    down          osd.9
 -5         0.01959         -   20 GiB  5.1 GiB  4.0 GiB  588 KiB  1.1 GiB
  15 GiB  25.29  0.67    -              host opcpcfpsksa0103
  7    ssd  0.00980   0.85004   10 GiB  483 MiB   75 MiB  539 KiB  407 MiB
 9.5 GiB   4.72  0.13    3      up          osd.7
 10    ssd  0.00980   0.55011   10 GiB  4.6 GiB  3.9 GiB   49 KiB  703 MiB
 5.4 GiB  45.85  1.22    5      up          osd.10
-16         0.01959         -   20 GiB  5.5 GiB  4.0 GiB  542 KiB  1.5 GiB
  15 GiB  27.28  0.73    -              host opcpcfpsksa0105
  8    ssd  0.00980   0.70007   10 GiB  851 MiB   75 MiB  121 KiB  775 MiB
 9.2 GiB   8.31  0.22   10      up          osd.8
 11    ssd  0.00980   0.45013   10 GiB  4.6 GiB  3.9 GiB  421 KiB  742 MiB
 5.4 GiB  46.24  1.23    5      up          osd.11
-10         0.09760         -  100 GiB   39 GiB   38 GiB  207 KiB  963 MiB
  61 GiB  38.59  1.03    -              host opcsdfpsksa0101
  1    hdd  0.04880   1.00000   50 GiB   19 GiB   19 GiB  207 KiB  639 MiB
  31 GiB  38.77  1.03  424      up          osd.1
 12    hdd  0.04880   1.00000   50 GiB   19 GiB   19 GiB      0 B  323 MiB
  31 GiB  38.40  1.02  430      up          osd.12
-13         0.09760         -  100 GiB   39 GiB   38 GiB  4.9 MiB  1.8 GiB
  61 GiB  39.41  1.05    -              host opcsdfpsksa0103
  2    hdd  0.04880   1.00000   50 GiB   20 GiB   20 GiB  2.6 MiB  703 MiB
  30 GiB  40.42  1.08  428      up          osd.2
  3    hdd  0.04880   1.00000   50 GiB   19 GiB   18 GiB  2.3 MiB  1.1 GiB
  31 GiB  38.39  1.02  429      up          osd.3
-19         0.09760         -  100 GiB   39 GiB   38 GiB  5.3 MiB  1.6 GiB
  61 GiB  39.25  1.04    -              host opcsdfpsksa0105
  4    hdd  0.04880   1.00000   50 GiB   19 GiB   19 GiB  2.7 MiB  560 MiB
  31 GiB  38.21  1.02  433      up          osd.4
  5    hdd  0.04880   1.00000   50 GiB   20 GiB   19 GiB  2.6 MiB  1.1 GiB
  30 GiB  40.30  1.07  427      up          osd.5
                        TOTAL  340 GiB  128 GiB  121 GiB   12 MiB  6.9 GiB
 212 GiB  37.58
MIN/MAX VAR: 0.13/1.23  STDDEV: 13.71
--------------------------------

 rados df
POOL_NAME                      USED  OBJECTS  CLONES  COPIES
 MISSING_ON_PRIMARY  UNFOUND  DEGRADED    RD_OPS       RD    WR_OPS
WR  USED COMPR  UNDER COMPR
.rgw.root                    52 KiB        4       0      12
    0        0         0       724  728 KiB        48   27 KiB         0 B
         0 B
cache-pool                  8.6 GiB     1750       0    5250
    0        0      1750       449  265 MiB      4003   20 KiB         0 B
         0 B
default.rgw.buckets.data    112 GiB    15498       0   46494
    0        0         0    249164   48 GiB   1241400  545 GiB         0 B
         0 B
default.rgw.buckets.index   6.6 MiB       11       0      33
    0        0         0     82342  245 MiB     33831   17 MiB         0 B
         0 B
default.rgw.buckets.non-ec  7.5 MiB      250       0     750
    0        0         0    524383  320 MiB    149486  124 MiB         0 B
         0 B
default.rgw.control             0 B        8       0      24
    0        0         0         0      0 B         0      0 B         0 B
         0 B
default.rgw.log             1.2 MiB      209       0     627
    0        0         0  26272099   25 GiB  17227290   69 MiB         0 B
         0 B
default.rgw.meta            121 KiB       14       0      42
    0        0         0      3007  2.2 MiB       129   58 KiB         0 B
         0 B
device_health_metrics           0 B        0       0       0
    0        0         0         0      0 B         0      0 B         0 B
         0 B

total_objects    17744
total_used       128 GiB
total_avail      212 GiB
total_space      340 GiB
---------------------------------------------------
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change
17591 lfor 0/14881/15285 flags hashpspool stripe_width 0 pg_num_min 1
application mgr_devicehealth
pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 1 object_hash
rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change 17835 lfor
0/3320/3318 flags hashpspool stripe_width 0 application rgw
pool 3 'default.rgw.log' replicated size 3 min_size 2 crush_rule 1
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
17840 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_rule 1
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
17847 flags hashpspool stripe_width 0 application rgw
pool 5 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 1
object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode on last_change
17855 lfor 0/15243/15287 flags hashpspool stripe_width 0 pg_autoscale_bias
4 pg_num_min 8 application rgw
pool 6 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule
1 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode on last_change
17590 lfor 0/15226/15287 flags hashpspool stripe_width 0 pg_autoscale_bias
4 pg_num_min 8 application rgw
pool 7 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 1
object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change
17826 lfor 17826/17826/17826 flags hashpspool tiers 9 read_tier 9
write_tier 9 stripe_width 0 application rgw
pool 8 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule
1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
17555 flags hashpspool stripe_width 0 application rgw
pool 9 'cache-pool' replicated size 3 min_size 2 crush_rule 2 object_hash
rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18019 lfor
17826/17826/17826 flags hashpspool,incomplete_clones max_bytes 10737418240
tier_of 7 cache_mode writeback target_bytes 7516192768 hit_set
bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 300s x8
decay_rate 0 search_last_n 0 stripe_width 0 application rgw
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx