I let the 2 working OSDs backfill over the last couple days and today I was able to add 7 more OSDs before getting PGs stuck activating. Below is the OSD and health outputs after adding an 8th OSD and getting activating PGs.
ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME
-1 832.57654 - 868T 583T 285T 67.10 1.00 - root default
-7 13.97192 - 14307G 119G 14187G 0.83 0.01 - host ceph-node-1
0 ssd 1.74649 1.00000 1788G 14197M 1774G 0.78 0.01 88 osd.0
1 ssd 1.74649 1.00000 1788G 15089M 1773G 0.82 0.01 96 osd.1
2 ssd 1.74649 1.00000 1788G 13915M 1774G 0.76 0.01 86 osd.2
3 ssd 1.74649 1.00000 1788G 16078M 1772G 0.88 0.01 100 osd.3
4 ssd 1.74649 1.00000 1788G 17087M 1771G 0.93 0.01 109 osd.4
5 ssd 1.74649 1.00000 1788G 15961M 1772G 0.87 0.01 101 osd.5
6 ssd 1.74649 1.00000 1788G 15080M 1773G 0.82 0.01 95 osd.6
7 ssd 1.74649 1.00000 1788G 14767M 1773G 0.81 0.01 93 osd.7
-3 727.64771 - 727T 573T 153T 78.88 1.18 - host ceph-node-2
8 hdd 9.09560 1.00000 9313G 7516G 1797G 80.70 1.20 168 osd.8
9 hdd 9.09560 1.00000 9313G 7005G 2308G 75.21 1.12 157 osd.9
10 hdd 9.09560 1.00000 9313G 8002G 1311G 85.92 1.28 179 osd.10
11 hdd 9.09560 0.95000 9313G 7850G 1463G 84.28 1.26 176 osd.11
12 hdd 9.09560 1.00000 9313G 7136G 2177G 76.62 1.14 160 osd.12
13 hdd 9.09560 1.00000 9313G 8119G 1194G 87.18 1.30 182 osd.13
14 hdd 9.09560 1.00000 9313G 7629G 1684G 81.92 1.22 171 osd.14
15 hdd 9.09560 1.00000 9313G 7046G 2267G 75.66 1.13 158 osd.15
16 hdd 9.09560 1.00000 9313G 7315G 1998G 78.54 1.17 164 osd.16
17 hdd 9.09560 1.00000 9313G 7320G 1993G 78.60 1.17 164 osd.17
18 hdd 9.09560 1.00000 9313G 7090G 2222G 76.13 1.13 159 osd.18
19 hdd 9.09560 1.00000 9313G 7187G 2126G 77.17 1.15 161 osd.19
20 hdd 9.09560 1.00000 9313G 7986G 1327G 85.75 1.28 179 osd.20
21 hdd 9.09560 1.00000 9313G 7271G 2042G 78.07 1.16 163 osd.21
22 hdd 9.09560 1.00000 9313G 7581G 1732G 81.40 1.21 170 osd.22
23 hdd 9.09560 1.00000 9313G 7510G 1802G 80.64 1.20 168 osd.23
24 hdd 9.09560 1.00000 9313G 7983G 1330G 85.71 1.28 179 osd.24
25 hdd 9.09560 1.00000 9313G 7905G 1408G 84.88 1.27 177 osd.25
26 hdd 9.09560 1.00000 9313G 7626G 1687G 81.88 1.22 171 osd.26
27 hdd 9.09560 1.00000 9313G 7270G 2043G 78.06 1.16 163 osd.27
28 hdd 9.09560 1.00000 9313G 7183G 2130G 77.12 1.15 161 osd.28
29 hdd 9.09560 1.00000 9313G 7492G 1821G 80.44 1.20 168 osd.29
30 hdd 9.09560 1.00000 9313G 6960G 2353G 74.73 1.11 156 osd.30
31 hdd 9.09560 1.00000 9313G 7093G 2220G 76.16 1.14 159 osd.31
32 hdd 9.09560 1.00000 9313G 6869G 2444G 73.76 1.10 154 osd.32
33 hdd 9.09560 1.00000 9313G 6738G 2575G 72.34 1.08 151 osd.33
34 hdd 9.09560 1.00000 9313G 7644G 1669G 82.08 1.22 171 osd.34
35 hdd 9.09560 1.00000 9313G 6646G 2667G 71.36 1.06 149 osd.35
36 hdd 9.09560 1.00000 9313G 7941G 1372G 85.27 1.27 178 osd.36
37 hdd 9.09560 1.00000 9313G 7141G 2172G 76.68 1.14 160 osd.37
38 hdd 9.09560 1.00000 9313G 7225G 2088G 77.58 1.16 162 osd.38
39 hdd 9.09560 1.00000 9313G 7141G 2172G 76.67 1.14 160 osd.39
40 hdd 9.09560 1.00000 9313G 7942G 1371G 85.27 1.27 178 osd.40
41 hdd 9.09560 1.00000 9313G 6736G 2577G 72.33 1.08 151 osd.41
42 hdd 9.09560 1.00000 9313G 7286G 2027G 78.23 1.17 163 osd.42
43 hdd 9.09560 1.00000 9313G 7051G 2262G 75.71 1.13 158 osd.43
44 hdd 9.09560 1.00000 9313G 7852G 1461G 84.31 1.26 176 osd.44
45 hdd 9.09560 1.00000 9313G 6735G 2578G 72.31 1.08 151 osd.45
46 hdd 9.09560 1.00000 9313G 7539G 1774G 80.95 1.21 169 osd.46
47 hdd 9.09560 1.00000 9313G 7048G 2265G 75.68 1.13 158 osd.47
48 hdd 9.09560 1.00000 9313G 7895G 1418G 84.77 1.26 176 osd.48
49 hdd 9.09560 1.00000 9313G 7186G 2126G 77.16 1.15 161 osd.49
50 hdd 9.09560 1.00000 9313G 7314G 1998G 78.54 1.17 164 osd.50
51 hdd 9.09560 1.00000 9313G 6977G 2336G 74.92 1.12 156 osd.51
52 hdd 9.09560 1.00000 9313G 7853G 1460G 84.32 1.26 176 osd.52
53 hdd 9.09560 1.00000 9313G 7138G 2175G 76.64 1.14 160 osd.53
54 hdd 9.09560 1.00000 9313G 7534G 1778G 80.90 1.21 169 osd.54
55 hdd 9.09560 1.00000 9313G 7537G 1775G 80.93 1.21 169 osd.55
56 hdd 9.09560 1.00000 9313G 6853G 2460G 73.58 1.10 153 osd.56
57 hdd 9.09560 1.00000 9313G 7512G 1801G 80.66 1.20 168 osd.57
58 hdd 9.09560 1.00000 9313G 7227G 2086G 77.60 1.16 162 osd.58
59 hdd 9.09560 1.00000 9313G 7138G 2175G 76.64 1.14 160 osd.59
60 hdd 9.09560 1.00000 9313G 7893G 1420G 84.75 1.26 177 osd.60
61 hdd 9.09560 1.00000 9313G 6870G 2443G 73.77 1.10 154 osd.61
62 hdd 9.09560 1.00000 9313G 6734G 2579G 72.30 1.08 151 osd.62
63 hdd 9.09560 1.00000 9313G 7763G 1550G 83.35 1.24 174 osd.63
64 hdd 9.09560 1.00000 9313G 7491G 1822G 80.43 1.20 168 osd.64
65 hdd 9.09560 1.00000 9313G 6644G 2669G 71.33 1.06 149 osd.65
66 hdd 9.09560 0.95000 9313G 7764G 1549G 83.37 1.24 174 osd.66
67 hdd 9.09560 0.95000 9313G 7862G 1451G 84.42 1.26 176 osd.67
68 hdd 9.09560 1.00000 9313G 7675G 1638G 82.41 1.23 172 osd.68
69 hdd 9.09560 1.00000 9313G 6557G 2756G 70.40 1.05 147 osd.69
70 hdd 9.09560 1.00000 9313G 7628G 1685G 81.90 1.22 171 osd.70
71 hdd 9.09560 1.00000 9313G 7627G 1686G 81.89 1.22 171 osd.71
72 hdd 9.09560 1.00000 9313G 7494G 1819G 80.47 1.20 168 osd.72
73 hdd 9.09560 1.00000 9313G 6423G 2889G 68.97 1.03 144 osd.73
74 hdd 9.09560 1.00000 9313G 7098G 2215G 76.21 1.14 159 osd.74
75 hdd 9.09560 1.00000 9313G 6156G 3157G 66.10 0.99 138 osd.75
76 hdd 9.09560 1.00000 9313G 6735G 2578G 72.31 1.08 151 osd.76
77 hdd 9.09560 1.00000 9313G 7756G 1557G 83.28 1.24 174 osd.77
78 hdd 9.09560 1.00000 9313G 7200G 2113G 77.31 1.15 161 osd.78
79 hdd 9.09560 1.00000 9313G 7582G 1731G 81.41 1.21 170 osd.79
80 hdd 9.09560 1.00000 9313G 7409G 1904G 79.56 1.19 166 osd.80
81 hdd 9.09560 1.00000 9313G 7900G 1413G 84.82 1.26 177 osd.81
82 hdd 9.09560 1.00000 9313G 7226G 2087G 77.59 1.16 162 osd.82
83 hdd 9.09560 1.00000 9313G 7225G 2088G 77.58 1.16 162 osd.83
84 hdd 9.09560 1.00000 9313G 7628G 1685G 81.90 1.22 171 osd.84
85 hdd 9.09560 1.00000 9313G 7225G 2087G 77.58 1.16 162 osd.85
86 hdd 9.09560 1.00000 9313G 7474G 1839G 80.25 1.20 167 osd.86
87 hdd 9.09560 1.00000 9313G 7895G 1418G 84.77 1.26 177 osd.87
-10 90.95688 - 127T 9170G 118T 7.03 0.10 - host ceph-node-3
88 hdd 9.09569 1.00000 9313G 2849G 6464G 30.59 0.46 43 osd.88
89 hdd 9.09569 1.00000 9313G 3633G 5680G 39.01 0.58 65 osd.89
90 hdd 9.09569 1.00000 9313G 1271G 8042G 13.66 0.20 18 osd.90
91 hdd 9.09569 1.00000 9313G 1026G 8287G 11.02 0.16 15 osd.91
92 hdd 9.09569 1.00000 9313G 114G 9199G 1.23 0.02 2 osd.92
93 hdd 9.09569 1.00000 9313G 147G 9166G 1.59 0.02 0 osd.93
94 hdd 9.09569 1.00000 9313G 50621M 9264G 0.53 0.01 0 osd.94
95 hdd 9.09569 1.00000 9313G 69424M 9246G 0.73 0.01 0 osd.95
96 hdd 9.09569 1.00000 9313G 3727M 9310G 0.04 0 0 osd.96
97 hdd 9.09569 1.00000 9313G 1645M 9312G 0.02 0 0 osd.97
98 hdd 0 1.00000 9313G 1279M 9312G 0.01 0 0 osd.98
99 hdd 0 1.00000 9313G 1271M 9312G 0.01 0 0 osd.99
100 hdd 0 1.00000 9313G 1277M 9312G 0.01 0 0 osd.100
101 hdd 0 1.00000 9313G 1270M 9312G 0.01 0 0 osd.101
TOTAL 868T 583T 285T 67.10
MIN/MAX VAR: 0/1.30 STDDEV: 31.40
ceph -s
cluster:
id: e9d4da87-25d8-4490-be4f-811284d2cb04
health: HEALTH_WARN
247564524/2168283597 objects misplaced (11.418%)
Reduced data availability: 13 pgs inactive
1 slow requests are blocked > 32 sec
services:
mon: 4 daemons, quorum ceph-node-4,ceph-node-1,ceph-node-2,ceph-node-5
mgr: ceph-node-4(active), standbys: ceph-node-1, ceph-node-5
mds: fusion-data-1/1/1 up {0=ceph-node-1=up:active}
osd: 102 osds: 102 up, 102 in; 777 remapped pgs
data:
pools: 2 pools, 1280 pgs
objects: 159M objects, 370 TB
usage: 583 TB used, 285 TB / 868 TB avail
pgs: 1.016% pgs not active
247564524/2168283597 objects misplaced (11.418%)
708 active+remapped+backfill_wait
503 active+clean
52 active+remapped+backfilling
13 activating+remapped
4 active+recovery_wait+remapped
io:
client: 1275 B/s wr, 0 op/s rd, 0 op/s wr
recovery: 542 MB/s, 233 objects/s
ceph health detail
HEALTH_WARN 247569110/2168283597 objects misplaced (11.418%); Reduced data availability: 13 pgs inactive; 1 slow requests are blocked > 32 sec
OBJECT_MISPLACED 247569110/2168283597 objects misplaced (11.418%)
PG_AVAILABILITY Reduced data availability: 13 pgs inactive
pg 8.cc is stuck inactive for 80.951719, current state activating+remapped, last acting [33,80,29,61,90,31,69,76,67,79,73,40,60]
pg 8.127 is stuck inactive for 81.022330, current state activating+remapped, last acting [25,40,66,20,36,8,76,65,61,39,33,87,58]
pg 8.13c is stuck inactive for 80.956013, current state activating+remapped, last acting [77,79,29,85,66,26,33,53,59,48,44,72,46]
pg 8.143 is stuck inactive for 81.027811, current state activating+remapped, last acting [61,72,79,89,67,56,40,9,66,81,80,32,13]
pg 8.151 is stuck inactive for 81.029819, current state activating+remapped, last acting [37,87,14,43,27,52,24,70,32,77,44,10,9]
pg 8.154 is stuck inactive for 80.890116, current state activating+remapped, last acting [20,67,77,44,64,37,55,63,38,30,74,70,40]
pg 8.1c3 is stuck inactive for 81.004076, current state activating+remapped, last acting [19,11,81,67,80,16,48,72,76,65,86,60,45]
pg 8.1f5 is stuck inactive for 80.995108, current state activating+remapped, last acting [77,66,86,42,48,49,69,67,12,71,36,44,29]
pg 8.2ad is stuck inactive for 81.103978, current state activating+remapped, last acting [52,65,24,9,51,36,49,62,42,66,55,78,46]
pg 8.2e7 is stuck inactive for 80.942131, current state activating+remapped, last acting [31,19,51,39,17,88,36,44,8,83,13,69,18]
pg 8.395 is stuck inactive for 81.061636, current state activating+remapped, last acting [74,66,68,71,61,53,40,72,11,62,34,81,27]
pg 8.399 is stuck inactive for 81.112158, current state activating+remapped, last acting [55,19,18,87,28,39,75,8,42,64,31,40,66]
pg 8.3e3 is stuck inactive for 80.985306, current state activating+remapped, last acting [15,84,46,13,34,8,36,86,56,18,75,43,61]
REQUEST_SLOW 1 slow requests are blocked > 32 sec
1 ops are blocked > 131.072 sec
osd.77 has blocked requests > 131.072 sec
On Thu, Mar 29, 2018 at 2:50 AM, Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx> wrote:
Hi Jon, can you reweight one OSD to default value and share outcome of "ceph osd df tree; ceph -s; ceph health detail" ?Recently I was adding new node, 12x 4TB, one disk at a time and faced activating+remapped state for few hours.Not sure but maybe that was caused by "osd_max_backfills" value and backfill awaiting PGs queue.# ceph -s
cluster:
id: 1023c49f-3a10-42de-9f62-9b122db21e1e health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set
1 nearfull osd(s)
19 pool(s) nearfull
33336982/289660233 objects misplaced (11.509%)
Reduced data availability: 29 pgs inactive
Degraded data redundancy: 788023/289660233 objects degraded (0.272%), 782 pgs unclean, 54 pgs degraded, 48 pgs undersized
services:
mon: 3 daemons, quorum mon1,mon2,mon3
mgr: mon2(active), standbys: mon3, mon1
osd: 120 osds: 120 up, 120 in; 779 remapped pgs
flags noscrub,nodeep-scrub
rgw: 3 daemons active
data:
pools: 19 pools, 3760 pgs
objects: 38285k objects, 146 TB
usage: 285 TB used, 150 TB / 436 TB avail
pgs: 0.771% pgs not active
788023/289660233 objects degraded (0.272%)
33336982/289660233 objects misplaced (11.509%)
2978 active+clean
646 active+remapped+backfill_wait
57 active+remapped+backfilling
27 active+undersized+degraded+remapped+backfill_wait 25 activating+remapped
17 active+undersized+degraded+remapped+backfilling 4 activating+undersized+degraded+remapped 3 active+recovery_wait+degraded
3 active+recovery_wait+degraded+remapped
io:
client: 2228 kB/s rd, 54831 kB/s wr, 539 op/s rd, 756 op/s wr
recovery: 1360 MB/s, 348 objects/sNow all PGs are active+clean.RegardsJakub
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com