Hi!
Kind regards,
Today I added some new OSDs (nearly doubled) to my luminous cluster.
I then changed pg(p)_num from 256 to 1024 for that pool because it was complaining about to few PGs. (I noticed that should better have been small changes).
This is the current status:
health: HEALTH_ERR
336568/1307562 objects misplaced (25.740%)
Reduced data availability: 128 pgs inactive, 3 pgs peering, 1 pg stale
Degraded data redundancy: 6985/1307562 objects degraded (0.534%), 19 pgs degraded, 19 pgs undersized
107 slow requests are blocked > 32 sec
218 stuck requests are blocked > 4096 sec
data:
pools: 2 pools, 1536 pgs
objects: 638k objects, 2549 GB
usage: 5210 GB used, 11295 GB / 16506 GB avail
pgs: 0.195% pgs unknown
8.138% pgs not active
6985/1307562 objects degraded (0.534%)
336568/1307562 objects misplaced (25.740%)
855 active+clean
517 active+remapped+backfill_wait
107 activating+remapped
31 active+remapped+backfilling
15 activating+undersized+degraded+remapped
4 active+undersized+degraded+remapped+backfilling
3 unknown
3 peering
1 stale+active+clean
OSD tree:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 16.12177 root default
-16 16.12177 datacenter dc01
-19 16.12177 pod dc01-agg01
-10 8.98700 rack dc01-rack02
-4 4.03899 host node1001
0 hdd 0.90999 osd.0 up 1.00000 1.00000
1 hdd 0.90999 osd.1 up 1.00000 1.00000
5 hdd 0.90999 osd.5 up 1.00000 1.00000
2 ssd 0.43700 osd.2 up 1.00000 1.00000
3 ssd 0.43700 osd.3 up 1.00000 1.00000
4 ssd 0.43700 osd.4 up 1.00000 1.00000
-7 4.94899 host node1002
9 hdd 0.90999 osd.9 up 1.00000 1.00000
10 hdd 0.90999 osd.10 up 1.00000 1.00000
11 hdd 0.90999 osd.11 up 1.00000 1.00000
12 hdd 0.90999 osd.12 up 1.00000 1.00000
6 ssd 0.43700 osd.6 up 1.00000 1.00000
7 ssd 0.43700 osd.7 up 1.00000 1.00000
8 ssd 0.43700 osd.8 up 1.00000 1.00000
-11 7.13477 rack dc01-rack03
-22 5.38678 host node1003
17 hdd 0.90970 osd.17 up 1.00000 1.00000
18 hdd 0.90970 osd.18 up 1.00000 1.00000
24 hdd 0.90970 osd.24 up 1.00000 1.00000
26 hdd 0.90970 osd.26 up 1.00000 1.00000
13 ssd 0.43700 osd.13 up 1.00000 1.00000
14 ssd 0.43700 osd.14 up 1.00000 1.00000
15 ssd 0.43700 osd.15 up 1.00000 1.00000
16 ssd 0.43700 osd.16 up 1.00000 1.00000
-25 1.74799 host node1004
19 ssd 0.43700 osd.19 up 1.00000 1.00000
20 ssd 0.43700 osd.20 up 1.00000 1.00000
21 ssd 0.43700 osd.21 up 1.00000 1.00000
22 ssd 0.43700 osd.22 up 1.00000 1.00000
Crush rule is set to chooseleaf rack and (temporary!) to size 2.
Why are PGs stuck in peering and activating?
"ceph df" shows that only 1,5TB are used on the pool, residing on the hdd's - which would perfectly fit the crush rule....(?)
Is this only a problem during recovery and the cluster moves to OK after rebalance or can I take any action to unblock IO on the hdd pool?
This is a pre-prod cluster, it does not have highest prio but I would appreciate if we would be able to use it before rebalancing is completed.
Kind regards,
Kevin
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com