Hi Paul,
before I upgrade, there are 17 osd server, (8 osd per server), 3 mds/rgw, 2 active mds, then I add 5 osd server(16 osd per server), then one active server crash( and I reboot it), the mds can't come back to health anymore, So, I add two new mds server, and delete one of the original the mds server. First I set the new osd crush weight to 1, ( 6TB per osd), the cluster doing balance. before the balance finished, I change the weight to 5.45798.
more info as below
# ceph -s
id: a00cc99c-f9f9-4dd9-9281-43cd12310e41
28750646/577747527 objects misplaced (4.976%)
Degraded data redundancy: 2724676/577747527 objects degraded (0.472%), 1476 pgs unclean, 451 pgs degraded, 356 pgs undersized
mon: 3 daemons, quorum ark0008,ark0009,ark0010
mgr: ark0009(active), standbys: ark0010, ark0008, ark0008.hobot.cc
mds: cephfs-2/2/1 up {0=ark0018.hobot.cc=up:active,1=ark0020.hobot.cc=up:active}, 2 up:standby
osd: 213 osds: 213 up, 209 in; 1433 remapped pgs
rgw: 1 daemon active
pools: 17 pools, 10324 pgs
objects: 183M objects, 124 TB
usage: 425 TB used, 479 TB / 904 TB avail
pgs: 2724676/577747527 objects degraded (0.472%)
28750646/577747527 objects misplaced (4.976%)
8848 active+clean
565 active+remapped+backfilling
449 active+remapped+backfill_wait
319 active+undersized+degraded+remapped+backfilling
36 active+undersized+degraded+remapped+backfill_wait
32 active+recovery_wait+degraded
29 active+recovery_wait+degraded+remapped
20 active+degraded+remapped+backfill_wait
14 active+degraded+remapped+backfilling
11 active+recovery_wait
1 active+recovery_wait+undersized+degraded+remapped
client: 2356 B/s rd, 9051 kB/s wr, 0 op/s rd, 185 op/s wr
recovery: 459 MB/s, 709 objects/s
# ceph health detail
HEALTH_WARN 28736684/577747554 objects misplaced (4.974%); Degraded data redundancy: 2722451/577747554 objects degraded (0.471%), 1475 pgs unclean, 451 pgs degraded, 356 pgs undersized
pg 5.dee is stuck unclean for 93114.056729, current state active+remapped+backfilling, last acting [19,153,64]
pg 5.df4 is stuck undersized for 86028.395042, current state active+undersized+degraded+remapped+backfilling, last acting [81,83]
pg 5.df8 is stuck unclean for 10529.471700, current state active+remapped+backfilling, last acting [53,212,106]
pg 5.dfa is stuck unclean for 86193.279939, current state active+remapped+backfill_wait, last acting [58,122,98]
pg 5.dfd is stuck unclean for 21944.059088, current state active+remapped+backfilling, last acting [119,91,22]
pg 5.e01 is stuck undersized for 73773.177963, current state active+undersized+degraded+remapped+backfilling, last acting [88,116]
pg 5.e02 is stuck undersized for 10615.864226, current state active+undersized+degraded+remapped+backfilling, last acting [112,110]
pg 5.e04 is active+degraded+remapped+backfilling, acting [44,10,104]
pg 5.e07 is stuck undersized for 86060.059937, current state active+undersized+degraded+remapped+backfilling, last acting [100,65]
pg 5.e09 is stuck unclean for 86247.708352, current state active+remapped+backfilling, last acting [19,187,46]
pg 5.e0a is stuck unclean for 93073.574629, current state active+remapped+backfilling, last acting [92,13,118]
pg 5.e0b is stuck unclean for 86247.949138, current state active+remapped+backfilling, last acting [31,54,68]
pg 5.e10 is stuck unclean for 17390.342397, current state active+remapped+backfill_wait, last acting [71,202,119]
pg 5.e13 is stuck unclean for 93092.549049, current state active+remapped+backfilling, last acting [33,90,110]
pg 5.e16 is stuck unclean for 86250.883911, current state active+remapped+backfill_wait, last acting [79,108,56]
pg 5.e17 is stuck undersized for 15167.783137, current state active+undersized+degraded+remapped+backfill_wait, last acting [42,28]
pg 5.e18 is stuck unclean for 18122.375128, current state active+remapped+backfill_wait, last acting [26,43,31]
pg 5.e20 is stuck unclean for 86255.524287, current state active+remapped+backfilling, last acting [122,52,7]
pg 5.e27 is stuck unclean for 10706.283143, current state active+remapped+backfill_wait, last acting [56,104,73]
pg 5.e29 is stuck undersized for 86036.590643, current state active+undersized+degraded+remapped+backfilling, last acting [49,35]
pg 5.e2c is stuck unclean for 86257.751565, current state active+remapped+backfilling, last acting [70,106,91]
pg 5.e2e is stuck undersized for 10615.804510, current state active+undersized+degraded+remapped+backfilling, last acting [35,103]
pg 5.e32 is stuck undersized for 74758.649684, current state active+undersized+degraded+remapped+backfilling, last acting [39,53]
pg 5.e35 is stuck unclean for 86195.364365, current state active+remapped+backfill_wait, last acting [60,133,71]
pg 5.e36 is stuck unclean for 10706.301969, current state active+remapped+backfilling, last acting [119,132,27]
pg 5.e39 is stuck undersized for 7625.972530, current state active+undersized+degraded+remapped+backfill_wait, last acting [59,68]
pg 5.e3e is stuck undersized for 15167.771334, current state active+undersized+degraded+remapped+backfilling, last acting [37,108]
pg 5.e3f is stuck unclean for 86446.144228, current state active+remapped+backfilling, last acting [36,77,70]
pg 5.e41 is stuck undersized for 85809.892887, current state active+undersized+degraded+remapped+backfilling, last acting [34,74]
pg 5.e42 is stuck unclean for 93066.921410, current state active+remapped+backfill_wait, last acting [50,116,203]
pg 5.e43 is stuck unclean for 86440.050082, current state active+remapped+backfilling, last acting [101,62,98]
pg 5.e45 is stuck undersized for 57193.815788, current state active+undersized+degraded+remapped+backfilling, last acting [46,89]
pg 5.e47 is stuck undersized for 10855.704014, current state active+undersized+degraded+remapped+backfill_wait, last acting [45,159]
pg 5.e48 is stuck unclean for 86257.863404, current state active+remapped+backfill_wait, last acting [179,124,56]
pg 5.e49 is stuck unclean for 86243.704781, current state active+remapped+backfill_wait, last acting [19,70,97]
pg 5.e4b is stuck unclean for 93119.500757, current state active+remapped+backfilling, last acting [19,97,28]
pg 5.e52 is active+recovery_wait+degraded+remapped, acting [35,60,16]
pg 5.e53 is stuck unclean for 93107.389507, current state active+remapped+backfilling, last acting [66,137,87]
pg 5.e5a is stuck unclean for 6972.965649, current state active+remapped+backfilling, last acting [110,210,25]
pg 5.e5c is stuck unclean for 86252.299945, current state active+remapped+backfill_wait, last acting [60,77,106]
pg 5.e5d is stuck unclean for 11554.357804, current state active+remapped+backfill_wait, last acting [136,91,45]
pg 5.e5f is stuck unclean for 86252.173042, current state active+remapped+backfilling, last acting [66,0,7]
pg 5.e60 is stuck undersized for 74698.272305, current state active+undersized+degraded+remapped+backfilling, last acting [89,44]
pg 5.e61 is stuck unclean for 10529.466552, current state active+remapped+backfilling, last acting [44,0,28]
pg 5.e63 is stuck unclean for 93120.827771, current state active+remapped+backfill_wait, last acting [26,123,42]
pg 5.e65 is stuck unclean for 86212.097907, current state active+remapped+backfilling, last acting [28,37,82]
pg 5.e71 is stuck unclean for 86223.575372, current state active+remapped+backfill_wait, last acting [121,56,110]
pg 5.e72 is stuck unclean for 16576.615045, current state active+remapped+backfill_wait, last acting [40,134,105]
pg 5.e74 is stuck unclean for 86221.162039, current state active+remapped+backfill_wait, last acting [76,169,59]
pg 5.e77 is stuck unclean for 93066.629341, current state active+remapped+backfilling, last acting [50,72,137]
pg 5.e79 is stuck unclean for 11772.868324, current state active+remapped+backfill_wait, last acting [52,24,101]
# ceph osd pool ls detail
pool 1 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 271 flags hashpspool stripe_width 0 application rgw
pool 2 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 275 flags hashpspool stripe_width 0 application rgw
pool 3 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 279 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 281 flags hashpspool stripe_width 0 application rgw
pool 5 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 5120 pgp_num 5120 last_change 101613 lfor 0/21288 flags hashpspool max_bytes 500000000000000 stripe_width 0 application cephfs
pool 6 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 5118 lfor 0/5108 flags hashpspool stripe_width 0 application cephfs
pool 8 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1000 pgp_num 1000 last_change 1984 lfor 0/1981 flags hashpspool max_bytes 100000000000000 stripe_width 0 application rbd
removed_snaps [1~3]
pool 9 'rbd-algorithm' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 300 pgp_num 300 last_change 578 flags hashpspool stripe_width 0 application rbd
pool 10 'rbd-smartlife' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 300 pgp_num 300 last_change 579 flags hashpspool stripe_width 0 application rbd
pool 11 'rbd-smartcity' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 300 pgp_num 300 last_change 580 flags hashpspool stripe_width 0 application rbd
pool 12 'rbd-smartauto' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 300 pgp_num 300 last_change 581 flags hashpspool stripe_width 0 application rbd
pool 13 'rbd-infras' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 300 pgp_num 300 last_change 584 flags hashpspool stripe_width 0 application rbd
removed_snaps [1~3]
pool 14 'rbd-env-home' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 300 pgp_num 300 last_change 1154 flags hashpspool stripe_width 0 application rbd
removed_snaps [1~5]
pool 15 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 39782 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 16 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 39787 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 17 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 39789 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 18 'rbd-test' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 300 pgp_num 300 last_change 62647 flags hashpspool stripe_width 0 application rbd
# ceph osd df tree
-1 918.75201 - 904T 424T 479T 47.00 1.00 - root default
-19 29.10876 - 29807G 16576G 13231G 55.61 1.18 - host ark0001
64 hdd 3.63860 1.00000 3725G 2010G 1715G 53.96 1.15 149 osd.64
65 hdd 3.63860 0.84999 3725G 2144G 1581G 57.55 1.22 141 osd.65
66 hdd 3.63860 0.99995 3725G 2266G 1459G 60.83 1.29 159 osd.66
67 hdd 3.63860 1.00000 3725G 1747G 1978G 46.89 1.00 117 osd.67
68 hdd 3.63860 0.90002 3725G 2126G 1599G 57.08 1.21 140 osd.68
69 hdd 3.63860 1.00000 3725G 1883G 1841G 50.56 1.08 143 osd.69
70 hdd 3.63860 1.00000 3725G 2217G 1508G 59.51 1.27 150 osd.70
71 hdd 3.63860 0.95001 3725G 2179G 1546G 58.50 1.24 144 osd.71
-17 29.10876 - 29807G 16035G 13772G 53.80 1.14 - host ark0002
56 hdd 3.63860 1.00000 3725G 1919G 1806G 51.50 1.10 131 osd.56
57 hdd 3.63860 1.00000 3725G 2092G 1633G 56.16 1.19 138 osd.57
58 hdd 3.63860 1.00000 3725G 2094G 1631G 56.22 1.20 126 osd.58
59 hdd 3.63860 1.00000 3725G 1966G 1759G 52.78 1.12 128 osd.59
60 hdd 3.63860 1.00000 3725G 2443G 1281G 65.59 1.40 133 osd.60
61 hdd 3.63860 1.00000 3725G 2038G 1687G 54.72 1.16 151 osd.61
62 hdd 3.63860 1.00000 3725G 1537G 2188G 41.26 0.88 131 osd.62
63 hdd 3.63860 0.84999 3725G 1943G 1782G 52.15 1.11 126 osd.63
-3 29.10876 - 29807G 16368G 13438G 54.91 1.17 - host ark0003
0 hdd 3.63860 1.00000 3725G 1925G 1800G 51.67 1.10 125 osd.0
1 hdd 3.63860 1.00000 3725G 1806G 1919G 48.48 1.03 131 osd.1
2 hdd 3.63860 0.84999 3725G 2018G 1707G 54.17 1.15 127 osd.2
3 hdd 3.63860 1.00000 3725G 2272G 1453G 61.00 1.30 142 osd.3
4 hdd 3.63860 1.00000 3725G 2313G 1412G 62.08 1.32 155 osd.4
5 hdd 3.63860 1.00000 3725G 2144G 1581G 57.56 1.22 143 osd.5
6 hdd 3.63860 1.00000 3725G 1755G 1970G 47.11 1.00 132 osd.6
8 hdd 3.63860 1.00000 3725G 2132G 1593G 57.24 1.22 137 osd.8
-5 29.10876 - 29807G 16916G 12891G 56.75 1.21 - host ark0004
7 hdd 3.63860 1.00000 3725G 2097G 1628G 56.29 1.20 146 osd.7
9 hdd 3.63860 1.00000 3725G 2294G 1431G 61.59 1.31 138 osd.9
10 hdd 3.63860 1.00000 3725G 1937G 1788G 52.00 1.11 123 osd.10
11 hdd 3.63860 1.00000 3725G 2378G 1347G 63.82 1.36 150 osd.11
12 hdd 3.63860 1.00000 3725G 1863G 1862G 50.01 1.06 126 osd.12
13 hdd 3.63860 1.00000 3725G 2416G 1309G 64.86 1.38 156 osd.13
14 hdd 3.63860 1.00000 3725G 2223G 1502G 59.68 1.27 161 osd.14
15 hdd 3.63860 1.00000 3725G 1705G 2020G 45.76 0.97 129 osd.15
-7 29.10876 - 29807G 15622G 14184G 52.41 1.12 - host ark0005
16 hdd 3.63860 1.00000 3725G 2135G 1590G 57.32 1.22 142 osd.16
17 hdd 3.63860 1.00000 3725G 1741G 1984G 46.74 0.99 128 osd.17
18 hdd 3.63860 1.00000 3725G 1724G 2000G 46.30 0.98 116 osd.18
19 hdd 3.63860 1.00000 3725G 2392G 1333G 64.22 1.37 162 osd.19
20 hdd 3.63860 1.00000 3725G 2190G 1534G 58.80 1.25 139 osd.20
21 hdd 3.63860 1.00000 3725G 1886G 1839G 50.64 1.08 135 osd.21
23 hdd 3.63860 1.00000 3725G 1968G 1757G 52.84 1.12 135 osd.23
24 hdd 3.63860 1.00000 3725G 1581G 2144G 42.45 0.90 116 osd.24
-9 25.47017 - 11177G 6687G 4489G 59.83 1.27 - host ark0006
22 hdd 3.63860 0 0 0 0 0 0 51 osd.22
25 hdd 3.63860 0 0 0 0 0 0 40 osd.25
26 hdd 3.63860 1.00000 3725G 2153G 1572G 57.80 1.23 138 osd.26
27 hdd 3.63860 0 0 0 0 0 0 42 osd.27
28 hdd 3.63860 0.84999 3725G 2413G 1312G 64.78 1.38 155 osd.28
29 hdd 3.63860 0 0 0 0 0 0 52 osd.29
32 hdd 3.63860 1.00000 3725G 2120G 1605G 56.91 1.21 138 osd.32
-11 25.47017 - 26081G 15146G 10935G 58.07 1.24 - host ark0007
31 hdd 3.63860 1.00000 3725G 2284G 1441G 61.31 1.30 155 osd.31
33 hdd 3.63860 1.00000 3725G 2020G 1705G 54.24 1.15 139 osd.33
34 hdd 3.63860 1.00000 3725G 2453G 1272G 65.85 1.40 157 osd.34
35 hdd 3.63860 1.00000 3725G 2389G 1336G 64.13 1.36 156 osd.35
36 hdd 3.63860 1.00000 3725G 2149G 1575G 57.70 1.23 147 osd.36
37 hdd 3.63860 1.00000 3725G 1718G 2007G 46.13 0.98 124 osd.37
39 hdd 3.63860 1.00000 3725G 2129G 1596G 57.14 1.22 157 osd.39
-21 29.10876 - 29807G 16720G 13086G 56.10 1.19 - host ark0008
72 hdd 3.63860 1.00000 3725G 2073G 1652G 55.66 1.18 143 osd.72
73 hdd 3.63860 1.00000 3725G 1910G 1815G 51.26 1.09 137 osd.73
74 hdd 3.63860 1.00000 3725G 2327G 1398G 62.48 1.33 158 osd.74
75 hdd 3.63860 1.00000 3725G 2059G 1666G 55.27 1.18 137 osd.75
76 hdd 3.63860 1.00000 3725G 2268G 1457G 60.89 1.30 150 osd.76
77 hdd 3.63860 1.00000 3725G 2158G 1567G 57.94 1.23 137 osd.77
78 hdd 3.63860 1.00000 3725G 1903G 1822G 51.09 1.09 136 osd.78
79 hdd 3.63860 1.00000 3725G 2018G 1706G 54.19 1.15 135 osd.79
-13 29.10876 - 29807G 16465G 13342G 55.24 1.18 - host ark0009
40 hdd 3.63860 1.00000 3725G 2112G 1613G 56.70 1.21 146 osd.40
41 hdd 3.63860 1.00000 3725G 1883G 1842G 50.56 1.08 133 osd.41
42 hdd 3.63860 1.00000 3725G 1945G 1780G 52.22 1.11 115 osd.42
43 hdd 3.63860 1.00000 3725G 1971G 1754G 52.91 1.13 132 osd.43
44 hdd 3.63860 1.00000 3725G 1892G 1833G 50.78 1.08 134 osd.44
45 hdd 3.63860 1.00000 3725G 1884G 1841G 50.59 1.08 137 osd.45
47 hdd 3.63860 1.00000 3725G 2248G 1477G 60.36 1.28 145 osd.47
49 hdd 3.63860 1.00000 3725G 2525G 1200G 67.78 1.44 146 osd.49
-15 29.10876 - 29807G 17561G 12246G 58.92 1.25 - host ark0010
46 hdd 3.63860 0.90002 3725G 2397G 1328G 64.35 1.37 131 osd.46
48 hdd 3.63860 1.00000 3725G 2542G 1183G 68.24 1.45 164 osd.48
50 hdd 3.63860 1.00000 3725G 2378G 1347G 63.84 1.36 144 osd.50
51 hdd 3.63860 1.00000 3725G 2413G 1312G 64.78 1.38 151 osd.51
52 hdd 3.63860 1.00000 3725G 1467G 2258G 39.39 0.84 136 osd.52
53 hdd 3.63860 1.00000 3725G 2117G 1608G 56.84 1.21 132 osd.53
54 hdd 3.63860 1.00000 3725G 2148G 1577G 57.66 1.23 140 osd.54
55 hdd 3.63860 1.00000 3725G 2095G 1630G 56.23 1.20 146 osd.55
-35 29.10876 - 29807G 16582G 13224G 55.63 1.18 - host ark0014
121 hdd 3.63860 1.00000 3725G 2116G 1609G 56.81 1.21 134 osd.121
122 hdd 3.63860 1.00000 3725G 2114G 1611G 56.74 1.21 127 osd.122
124 hdd 3.63860 1.00000 3725G 2043G 1682G 54.85 1.17 129 osd.124
126 hdd 3.63860 1.00000 3725G 1913G 1812G 51.35 1.09 126 osd.126
128 hdd 3.63860 1.00000 3725G 2004G 1721G 53.79 1.14 120 osd.128
130 hdd 3.63860 0.84999 3725G 2180G 1544G 58.53 1.25 131 osd.130
132 hdd 3.63860 1.00000 3725G 2109G 1616G 56.62 1.20 124 osd.132
134 hdd 3.63860 1.00000 3725G 2099G 1626G 56.36 1.20 126 osd.134
-33 29.10876 - 29807G 16896G 12910G 56.69 1.21 - host ark0015
120 hdd 3.63860 1.00000 3725G 2052G 1673G 55.07 1.17 134 osd.120
123 hdd 3.63860 0.99994 3725G 2431G 1294G 65.27 1.39 151 osd.123
125 hdd 3.63860 1.00000 3725G 2051G 1674G 55.06 1.17 124 osd.125
127 hdd 3.63860 1.00000 3725G 2106G 1619G 56.53 1.20 137 osd.127
129 hdd 3.63860 1.00000 3725G 1772G 1953G 47.57 1.01 136 osd.129
131 hdd 3.63860 1.00000 3725G 2179G 1546G 58.49 1.24 149 osd.131
133 hdd 3.63860 1.00000 3725G 2000G 1725G 53.69 1.14 156 osd.133
135 hdd 3.63860 1.00000 3725G 2303G 1422G 61.81 1.32 162 osd.135
-23 29.10876 - 29807G 16551G 13255G 55.53 1.18 - host ark0016
80 hdd 3.63860 1.00000 3725G 1946G 1779G 52.24 1.11 126 osd.80
83 hdd 3.63860 1.00000 3725G 1807G 1918G 48.50 1.03 138 osd.83
88 hdd 3.63860 1.00000 3725G 2295G 1430G 61.61 1.31 154 osd.88
92 hdd 3.63860 1.00000 3725G 2249G 1476G 60.37 1.28 146 osd.92
96 hdd 3.63860 1.00000 3725G 1947G 1778G 52.26 1.11 125 osd.96
101 hdd 3.63860 1.00000 3725G 2391G 1333G 64.20 1.37 156 osd.101
105 hdd 3.63860 1.00000 3725G 1961G 1764G 52.64 1.12 129 osd.105
110 hdd 3.63860 1.00000 3725G 1952G 1772G 52.42 1.12 131 osd.110
-27 29.10876 - 29807G 15892G 13914G 53.32 1.13 - host ark0017
81 hdd 3.63860 1.00000 3725G 2118G 1607G 56.85 1.21 134 osd.81
85 hdd 3.63860 0.99995 3725G 2013G 1712G 54.04 1.15 138 osd.85
90 hdd 3.63860 0.84998 3725G 2267G 1458G 60.87 1.29 145 osd.90
94 hdd 3.63860 1.00000 3725G 1541G 2184G 41.37 0.88 117 osd.94
100 hdd 3.63860 1.00000 3725G 1829G 1896G 49.10 1.04 127 osd.100
103 hdd 3.63860 1.00000 3725G 1940G 1785G 52.08 1.11 138 osd.103
109 hdd 3.63860 1.00000 3725G 1751G 1974G 47.00 1.00 116 osd.109
112 hdd 3.63860 1.00000 3725G 2430G 1295G 65.23 1.39 158 osd.112
-31 29.10876 - 29807G 15454G 14353G 51.85 1.10 - host ark0018
89 hdd 3.63860 1.00000 3725G 2291G 1434G 61.50 1.31 169 osd.89
98 hdd 3.63860 1.00000 3725G 2048G 1677G 54.99 1.17 151 osd.98
107 hdd 3.63860 1.00000 3725G 1678G 2047G 45.04 0.96 117 osd.107
115 hdd 3.63860 1.00000 3725G 2043G 1682G 54.84 1.17 139 osd.115
116 hdd 3.63860 1.00000 3725G 1741G 1984G 46.75 0.99 149 osd.116
117 hdd 3.63860 1.00000 3725G 1726G 1999G 46.35 0.99 129 osd.117
118 hdd 3.63860 1.00000 3725G 1937G 1788G 52.01 1.11 149 osd.118
119 hdd 3.63860 1.00000 3725G 1986G 1739G 53.31 1.13 130 osd.119
-29 29.10876 - 29807G 15590G 14217G 52.30 1.11 - host ark0019
84 hdd 3.63860 1.00000 3725G 1857G 1868G 49.84 1.06 119 osd.84
86 hdd 3.63860 1.00000 3725G 1964G 1761G 52.72 1.12 142 osd.86
93 hdd 3.63860 0.50000 3725G 1038G 2686G 27.89 0.59 72 osd.93
97 hdd 3.63860 1.00000 3725G 2156G 1569G 57.88 1.23 141 osd.97
102 hdd 3.63860 1.00000 3725G 1874G 1851G 50.31 1.07 129 osd.102
106 hdd 3.63860 0.95001 3725G 2397G 1328G 64.34 1.37 148 osd.106
111 hdd 3.63860 0.99995 3725G 2286G 1439G 61.37 1.31 139 osd.111
114 hdd 3.63860 1.00000 3725G 2014G 1711G 54.07 1.15 145 osd.114
-25 29.10876 - 29807G 16555G 13251G 55.54 1.18 - host ark0020
82 hdd 3.63860 1.00000 3725G 2172G 1553G 58.31 1.24 146 osd.82
87 hdd 3.63860 1.00000 3725G 2424G 1301G 65.06 1.38 145 osd.87
91 hdd 3.63860 1.00000 3725G 2254G 1471G 60.51 1.29 148 osd.91
95 hdd 3.63860 1.00000 3725G 2076G 1649G 55.72 1.19 140 osd.95
99 hdd 3.63860 1.00000 3725G 1706G 2019G 45.79 0.97 133 osd.99
104 hdd 3.63860 1.00000 3725G 2127G 1598G 57.10 1.21 146 osd.104
108 hdd 3.63860 1.00000 3725G 2075G 1650G 55.72 1.19 133 osd.108
113 hdd 3.63860 1.00000 3725G 1719G 2006G 46.14 0.98 130 osd.113
-43 87.32764 - 89424G 34318G 55106G 38.38 0.82 - host storage024
183 hdd 5.45798 1.00000 5589G 2516G 3072G 45.03 0.96 178 osd.183
184 hdd 5.45798 1.00000 5589G 1852G 3736G 33.15 0.71 141 osd.184
185 hdd 5.45798 1.00000 5589G 2024G 3564G 36.23 0.77 143 osd.185
186 hdd 5.45798 1.00000 5589G 1979G 3609G 35.43 0.75 149 osd.186
187 hdd 5.45798 1.00000 5589G 2256G 3332G 40.38 0.86 148 osd.187
188 hdd 5.45798 1.00000 5589G 2365G 3223G 42.32 0.90 168 osd.188
189 hdd 5.45798 1.00000 5589G 2220G 3368G 39.73 0.85 151 osd.189
190 hdd 5.45798 1.00000 5589G 1956G 3632G 35.01 0.74 151 osd.190
191 hdd 5.45798 1.00000 5589G 2156G 3432G 38.58 0.82 171 osd.191
192 hdd 5.45798 1.00000 5589G 1891G 3697G 33.85 0.72 148 osd.192
193 hdd 5.45798 1.00000 5589G 2360G 3228G 42.23 0.90 170 osd.193
194 hdd 5.45798 1.00000 5589G 2319G 3269G 41.50 0.88 153 osd.194
195 hdd 5.45798 1.00000 5589G 1928G 3660G 34.50 0.73 148 osd.195
196 hdd 5.45798 1.00000 5589G 2355G 3233G 42.15 0.90 149 osd.196
197 hdd 5.45798 1.00000 5589G 1996G 3593G 35.71 0.76 160 osd.197
198 hdd 5.45798 1.00000 5589G 2136G 3452G 38.23 0.81 179 osd.198
-45 87.32764 - 89424G 34290G 55134G 38.35 0.82 - host storage025
199 hdd 5.45798 1.00000 5589G 2226G 3362G 39.83 0.85 172 osd.199
200 hdd 5.45798 1.00000 5589G 1864G 3724G 33.37 0.71 133 osd.200
201 hdd 5.45798 1.00000 5589G 2065G 3523G 36.96 0.79 162 osd.201
202 hdd 5.45798 1.00000 5589G 2385G 3203G 42.69 0.91 154 osd.202
203 hdd 5.45798 1.00000 5589G 2239G 3349G 40.07 0.85 186 osd.203
204 hdd 5.45798 1.00000 5589G 2085G 3503G 37.31 0.79 172 osd.204
205 hdd 5.45798 1.00000 5589G 2478G 3110G 44.34 0.94 181 osd.205
206 hdd 5.45798 1.00000 5589G 1815G 3773G 32.49 0.69 150 osd.206
207 hdd 5.45798 1.00000 5589G 2159G 3429G 38.63 0.82 155 osd.207
208 hdd 5.45798 1.00000 5589G 2004G 3584G 35.87 0.76 120 osd.208
209 hdd 5.45798 1.00000 5589G 2350G 3238G 42.06 0.89 155 osd.209
210 hdd 5.45798 1.00000 5589G 2136G 3452G 38.23 0.81 161 osd.210
211 hdd 5.45798 1.00000 5589G 2409G 3179G 43.11 0.92 158 osd.211
212 hdd 5.45798 1.00000 5589G 1975G 3613G 35.34 0.75 148 osd.212
213 hdd 5.45798 1.00000 5589G 1989G 3599G 35.60 0.76 158 osd.213
214 hdd 5.45798 1.00000 5589G 2102G 3486G 37.62 0.80 147 osd.214
-41 87.32764 - 89424G 33898G 55525G 37.91 0.81 - host storage026
167 hdd 5.45798 1.00000 5589G 1847G 3741G 33.05 0.70 141 osd.167
168 hdd 5.45798 1.00000 5589G 2341G 3247G 41.90 0.89 183 osd.168
169 hdd 5.45798 1.00000 5589G 1763G 3825G 31.55 0.67 142 osd.169
170 hdd 5.45798 1.00000 5589G 2147G 3441G 38.42 0.82 153 osd.170
171 hdd 5.45798 1.00000 5589G 2306G 3282G 41.27 0.88 148 osd.171
172 hdd 5.45798 1.00000 5589G 2135G 3453G 38.21 0.81 164 osd.172
173 hdd 5.45798 1.00000 5589G 2308G 3280G 41.31 0.88 165 osd.173
174 hdd 5.45798 1.00000 5589G 2045G 3543G 36.61 0.78 151 osd.174
175 hdd 5.45798 1.00000 5589G 2116G 3472G 37.86 0.81 140 osd.175
176 hdd 5.45798 1.00000 5589G 1632G 3956G 29.22 0.62 125 osd.176
177 hdd 5.45798 1.00000 5589G 2380G 3208G 42.60 0.91 151 osd.177
178 hdd 5.45798 1.00000 5589G 2339G 3249G 41.86 0.89 168 osd.178
179 hdd 5.45798 1.00000 5589G 2223G 3365G 39.78 0.85 163 osd.179
180 hdd 5.45798 1.00000 5589G 1996G 3592G 35.73 0.76 161 osd.180
181 hdd 5.45798 1.00000 5589G 2130G 3458G 38.12 0.81 169 osd.181
182 hdd 5.45798 1.00000 5589G 2182G 3406G 39.04 0.83 157 osd.182
-39 87.32764 - 89424G 32801G 56623G 36.68 0.78 - host storage027
151 hdd 5.45798 1.00000 5589G 1816G 3772G 32.50 0.69 163 osd.151
152 hdd 5.45798 1.00000 5589G 2097G 3491G 37.52 0.80 159 osd.152
153 hdd 5.45798 1.00000 5589G 1911G 3677G 34.20 0.73 145 osd.153
154 hdd 5.45798 1.00000 5589G 1741G 3847G 31.16 0.66 145 osd.154
155 hdd 5.45798 1.00000 5589G 1979G 3609G 35.41 0.75 147 osd.155
156 hdd 5.45798 1.00000 5589G 1864G 3724G 33.36 0.71 160 osd.156
157 hdd 5.45798 1.00000 5589G 2054G 3534G 36.77 0.78 156 osd.157
158 hdd 5.45798 1.00000 5589G 2825G 2763G 50.56 1.08 181 osd.158
159 hdd 5.45798 1.00000 5589G 1982G 3606G 35.48 0.75 147 osd.159
160 hdd 5.45798 1.00000 5589G 2115G 3473G 37.86 0.81 151 osd.160
161 hdd 5.45798 1.00000 5589G 2166G 3422G 38.76 0.82 156 osd.161
162 hdd 5.45798 1.00000 5589G 2121G 3467G 37.95 0.81 151 osd.162
163 hdd 5.45798 1.00000 5589G 2107G 3481G 37.70 0.80 153 osd.163
164 hdd 5.45798 1.00000 5589G 1897G 3691G 33.94 0.72 149 osd.164
165 hdd 5.45798 1.00000 5589G 2127G 3461G 38.07 0.81 152 osd.165
166 hdd 5.45798 1.00000 5589G 1991G 3597G 35.64 0.76 185 osd.166
-37 81.86966 - 83835G 32262G 51572G 38.48 0.82 - host storage028
136 hdd 5.45798 1.00000 5589G 2139G 3449G 38.27 0.81 165 osd.136
137 hdd 5.45798 1.00000 5589G 2125G 3463G 38.02 0.81 150 osd.137
138 hdd 5.45798 1.00000 5589G 2208G 3380G 39.52 0.84 182 osd.138
139 hdd 5.45798 1.00000 5589G 2608G 2980G 46.68 0.99 180 osd.139
140 hdd 5.45798 1.00000 5589G 2086G 3502G 37.33 0.79 145 osd.140
141 hdd 5.45798 1.00000 5589G 2220G 3368G 39.73 0.85 163 osd.141
142 hdd 5.45798 1.00000 5589G 2284G 3304G 40.88 0.87 186 osd.142
143 hdd 5.45798 1.00000 5589G 1868G 3720G 33.43 0.71 150 osd.143
144 hdd 5.45798 1.00000 5589G 2090G 3498G 37.41 0.80 161 osd.144
145 hdd 5.45798 1.00000 5589G 1964G 3624G 35.15 0.75 167 osd.145
146 hdd 5.45798 1.00000 5589G 2201G 3387G 39.39 0.84 158 osd.146
147 hdd 5.45798 1.00000 5589G 2250G 3338G 40.28 0.86 161 osd.147
148 hdd 5.45798 1.00000 5589G 2050G 3538G 36.69 0.78 151 osd.148
149 hdd 5.45798 1.00000 5589G 2210G 3378G 39.55 0.84 182 osd.149
150 hdd 5.45798 1.00000 5589G 1951G 3637G 34.92 0.74 166 osd.150
TOTAL 904T 424T 479T 47.00
MIN/MAX VAR: 0.59/1.45 STDDEV: 10.18
# ceph versions
"mon": {
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3
"mgr": {
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 4
"osd": {
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 68,
"ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)": 64,
"ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)": 2,
"ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)": 79
"mds": {
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 1,
"ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)": 3
"rgw": {
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 1
"overall": {
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 77,
"ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)": 67,
"ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)": 2,
"ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)": 79
On Sat, Sep 15, 2018 at 10:45 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
Well, that's not a lot of information to troubleshoot such a problem.
Please post the output of the following commands:
* ceph -s
* ceph health detail
* ceph osd pool ls detail
* ceph osd tree
* ceph osd df tree
* ceph versions
And a description of what you did to upgrade it.
2018-09-15 15:46 GMT+02:00 Frank Yu <flyxiaoyu@xxxxxxxxx>:
> Hello there,
> I have a ceph cluster which increase from 400TB to 900 TB recently, now the
> cluster is in unhealthy status, there're about 1700+ pg in unclean status
> # ceph pg dump_stuck unclean|wc
> ok
> 1696 10176 191648
> the cephfs can't work anymore, the read io was no more than MB/s.
> Is there any way to fix the unclean pg quickly?
> --
> Regards
> Frank Yu
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
Tel: +49 89 1896585 90
Frank Yu
Frank Yu
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com