This looks fine and will recover on its own. If you are not seeing enough client IO means that your tuning for recovery IO vs client IO priority is incorrect. A simple and effective way is increasing the osd_recovery_sleep_hdd option (I think the default is 0.05 in Luminous and 0.1 since Mimic?) which throttles recovery speed. Paul 2018-09-15 17:31 GMT+02:00 Frank Yu <flyxiaoyu@xxxxxxxxx>: > Hi Paul, > > before I upgrade, there are 17 osd server, (8 osd per server), 3 mds/rgw, 2 > active mds, then I add 5 osd server(16 osd per server), then one active > server crash( and I reboot it), the mds can't come back to health anymore, > So, I add two new mds server, and delete one of the original the mds server. > First I set the new osd crush weight to 1, ( 6TB per osd), the cluster doing > balance. before the balance finished, I change the weight to 5.45798. > > more info as below > > ------------------------------------ > # ceph -s > cluster: > id: a00cc99c-f9f9-4dd9-9281-43cd12310e41 > health: HEALTH_WARN > 28750646/577747527 objects misplaced (4.976%) > Degraded data redundancy: 2724676/577747527 objects degraded > (0.472%), 1476 pgs unclean, 451 pgs degraded, 356 pgs undersized > > services: > mon: 3 daemons, quorum ark0008,ark0009,ark0010 > mgr: ark0009(active), standbys: ark0010, ark0008, ark0008.hobot.cc > mds: cephfs-2/2/1 up > {0=ark0018.hobot.cc=up:active,1=ark0020.hobot.cc=up:active}, 2 up:standby > osd: 213 osds: 213 up, 209 in; 1433 remapped pgs > rgw: 1 daemon active > > data: > pools: 17 pools, 10324 pgs > objects: 183M objects, 124 TB > usage: 425 TB used, 479 TB / 904 TB avail > pgs: 2724676/577747527 objects degraded (0.472%) > 28750646/577747527 objects misplaced (4.976%) > 8848 active+clean > 565 active+remapped+backfilling > 449 active+remapped+backfill_wait > 319 active+undersized+degraded+remapped+backfilling > 36 active+undersized+degraded+remapped+backfill_wait > 32 active+recovery_wait+degraded > 29 active+recovery_wait+degraded+remapped > 20 active+degraded+remapped+backfill_wait > 14 active+degraded+remapped+backfilling > 11 active+recovery_wait > 1 active+recovery_wait+undersized+degraded+remapped > > io: > client: 2356 B/s rd, 9051 kB/s wr, 0 op/s rd, 185 op/s wr > recovery: 459 MB/s, 709 objects/s > ------------------------------------ > # ceph health detail > HEALTH_WARN 28736684/577747554 objects misplaced (4.974%); Degraded data > redundancy: 2722451/577747554 objects degraded (0.471%), 1475 pgs unclean, > 451 pgs degraded, 356 pgs undersized > pg 5.dee is stuck unclean for 93114.056729, current state > active+remapped+backfilling, last acting [19,153,64] > pg 5.df4 is stuck undersized for 86028.395042, current state > active+undersized+degraded+remapped+backfilling, last acting [81,83] > pg 5.df8 is stuck unclean for 10529.471700, current state > active+remapped+backfilling, last acting [53,212,106] > pg 5.dfa is stuck unclean for 86193.279939, current state > active+remapped+backfill_wait, last acting [58,122,98] > pg 5.dfd is stuck unclean for 21944.059088, current state > active+remapped+backfilling, last acting [119,91,22] > pg 5.e01 is stuck undersized for 73773.177963, current state > active+undersized+degraded+remapped+backfilling, last acting [88,116] > pg 5.e02 is stuck undersized for 10615.864226, current state > active+undersized+degraded+remapped+backfilling, last acting [112,110] > pg 5.e04 is active+degraded+remapped+backfilling, acting [44,10,104] > pg 5.e07 is stuck undersized for 86060.059937, current state > active+undersized+degraded+remapped+backfilling, last acting [100,65] > pg 5.e09 is stuck unclean for 86247.708352, current state > active+remapped+backfilling, last acting [19,187,46] > pg 5.e0a is stuck unclean for 93073.574629, current state > active+remapped+backfilling, last acting [92,13,118] > pg 5.e0b is stuck unclean for 86247.949138, current state > active+remapped+backfilling, last acting [31,54,68] > pg 5.e10 is stuck unclean for 17390.342397, current state > active+remapped+backfill_wait, last acting [71,202,119] > pg 5.e13 is stuck unclean for 93092.549049, current state > active+remapped+backfilling, last acting [33,90,110] > pg 5.e16 is stuck unclean for 86250.883911, current state > active+remapped+backfill_wait, last acting [79,108,56] > pg 5.e17 is stuck undersized for 15167.783137, current state > active+undersized+degraded+remapped+backfill_wait, last acting [42,28] > pg 5.e18 is stuck unclean for 18122.375128, current state > active+remapped+backfill_wait, last acting [26,43,31] > pg 5.e20 is stuck unclean for 86255.524287, current state > active+remapped+backfilling, last acting [122,52,7] > pg 5.e27 is stuck unclean for 10706.283143, current state > active+remapped+backfill_wait, last acting [56,104,73] > pg 5.e29 is stuck undersized for 86036.590643, current state > active+undersized+degraded+remapped+backfilling, last acting [49,35] > pg 5.e2c is stuck unclean for 86257.751565, current state > active+remapped+backfilling, last acting [70,106,91] > pg 5.e2e is stuck undersized for 10615.804510, current state > active+undersized+degraded+remapped+backfilling, last acting [35,103] > pg 5.e32 is stuck undersized for 74758.649684, current state > active+undersized+degraded+remapped+backfilling, last acting [39,53] > pg 5.e35 is stuck unclean for 86195.364365, current state > active+remapped+backfill_wait, last acting [60,133,71] > pg 5.e36 is stuck unclean for 10706.301969, current state > active+remapped+backfilling, last acting [119,132,27] > pg 5.e39 is stuck undersized for 7625.972530, current state > active+undersized+degraded+remapped+backfill_wait, last acting [59,68] > pg 5.e3e is stuck undersized for 15167.771334, current state > active+undersized+degraded+remapped+backfilling, last acting [37,108] > pg 5.e3f is stuck unclean for 86446.144228, current state > active+remapped+backfilling, last acting [36,77,70] > pg 5.e41 is stuck undersized for 85809.892887, current state > active+undersized+degraded+remapped+backfilling, last acting [34,74] > pg 5.e42 is stuck unclean for 93066.921410, current state > active+remapped+backfill_wait, last acting [50,116,203] > pg 5.e43 is stuck unclean for 86440.050082, current state > active+remapped+backfilling, last acting [101,62,98] > pg 5.e45 is stuck undersized for 57193.815788, current state > active+undersized+degraded+remapped+backfilling, last acting [46,89] > pg 5.e47 is stuck undersized for 10855.704014, current state > active+undersized+degraded+remapped+backfill_wait, last acting [45,159] > pg 5.e48 is stuck unclean for 86257.863404, current state > active+remapped+backfill_wait, last acting [179,124,56] > pg 5.e49 is stuck unclean for 86243.704781, current state > active+remapped+backfill_wait, last acting [19,70,97] > pg 5.e4b is stuck unclean for 93119.500757, current state > active+remapped+backfilling, last acting [19,97,28] > pg 5.e52 is active+recovery_wait+degraded+remapped, acting [35,60,16] > pg 5.e53 is stuck unclean for 93107.389507, current state > active+remapped+backfilling, last acting [66,137,87] > pg 5.e5a is stuck unclean for 6972.965649, current state > active+remapped+backfilling, last acting [110,210,25] > pg 5.e5c is stuck unclean for 86252.299945, current state > active+remapped+backfill_wait, last acting [60,77,106] > pg 5.e5d is stuck unclean for 11554.357804, current state > active+remapped+backfill_wait, last acting [136,91,45] > pg 5.e5f is stuck unclean for 86252.173042, current state > active+remapped+backfilling, last acting [66,0,7] > pg 5.e60 is stuck undersized for 74698.272305, current state > active+undersized+degraded+remapped+backfilling, last acting [89,44] > pg 5.e61 is stuck unclean for 10529.466552, current state > active+remapped+backfilling, last acting [44,0,28] > pg 5.e63 is stuck unclean for 93120.827771, current state > active+remapped+backfill_wait, last acting [26,123,42] > pg 5.e65 is stuck unclean for 86212.097907, current state > active+remapped+backfilling, last acting [28,37,82] > pg 5.e71 is stuck unclean for 86223.575372, current state > active+remapped+backfill_wait, last acting [121,56,110] > pg 5.e72 is stuck unclean for 16576.615045, current state > active+remapped+backfill_wait, last acting [40,134,105] > pg 5.e74 is stuck unclean for 86221.162039, current state > active+remapped+backfill_wait, last acting [76,169,59] > pg 5.e77 is stuck unclean for 93066.629341, current state > active+remapped+backfilling, last acting [50,72,137] > pg 5.e79 is stuck unclean for 11772.868324, current state > active+remapped+backfill_wait, last acting [52,24,101] > ------------------------------------ > # ceph osd pool ls detail > pool 1 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash > rjenkins pg_num 8 pgp_num 8 last_change 271 flags hashpspool stripe_width 0 > application rgw > pool 2 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 275 flags hashpspool > stripe_width 0 application rgw > pool 3 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 279 flags hashpspool > stripe_width 0 application rgw > pool 4 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 281 flags hashpspool > stripe_width 0 application rgw > pool 5 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash > rjenkins pg_num 5120 pgp_num 5120 last_change 101613 lfor 0/21288 flags > hashpspool max_bytes 500000000000000 stripe_width 0 application cephfs > pool 6 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 5118 lfor 0/5108 > flags hashpspool stripe_width 0 application cephfs > pool 8 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins > pg_num 1000 pgp_num 1000 last_change 1984 lfor 0/1981 flags hashpspool > max_bytes 100000000000000 stripe_width 0 application rbd > removed_snaps [1~3] > pool 9 'rbd-algorithm' replicated size 3 min_size 2 crush_rule 0 object_hash > rjenkins pg_num 300 pgp_num 300 last_change 578 flags hashpspool > stripe_width 0 application rbd > pool 10 'rbd-smartlife' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 300 pgp_num 300 last_change 579 flags hashpspool > stripe_width 0 application rbd > pool 11 'rbd-smartcity' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 300 pgp_num 300 last_change 580 flags hashpspool > stripe_width 0 application rbd > pool 12 'rbd-smartauto' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 300 pgp_num 300 last_change 581 flags hashpspool > stripe_width 0 application rbd > pool 13 'rbd-infras' replicated size 3 min_size 2 crush_rule 0 object_hash > rjenkins pg_num 300 pgp_num 300 last_change 584 flags hashpspool > stripe_width 0 application rbd > removed_snaps [1~3] > pool 14 'rbd-env-home' replicated size 3 min_size 2 crush_rule 0 object_hash > rjenkins pg_num 300 pgp_num 300 last_change 1154 flags hashpspool > stripe_width 0 application rbd > removed_snaps [1~5] > pool 15 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule > 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 39782 owner > 18446744073709551615 flags hashpspool stripe_width 0 application rgw > pool 16 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 39787 owner > 18446744073709551615 flags hashpspool stripe_width 0 application rgw > pool 17 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule > 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 39789 owner > 18446744073709551615 flags hashpspool stripe_width 0 application rgw > pool 18 'rbd-test' replicated size 3 min_size 2 crush_rule 0 object_hash > rjenkins pg_num 300 pgp_num 300 last_change 62647 flags hashpspool > stripe_width 0 application rbd > > ------------------------------------ > # ceph osd df tree > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME > -1 918.75201 - 904T 424T 479T 47.00 1.00 - root > default > -19 29.10876 - 29807G 16576G 13231G 55.61 1.18 - host > ark0001 > 64 hdd 3.63860 1.00000 3725G 2010G 1715G 53.96 1.15 149 > osd.64 > 65 hdd 3.63860 0.84999 3725G 2144G 1581G 57.55 1.22 141 > osd.65 > 66 hdd 3.63860 0.99995 3725G 2266G 1459G 60.83 1.29 159 > osd.66 > 67 hdd 3.63860 1.00000 3725G 1747G 1978G 46.89 1.00 117 > osd.67 > 68 hdd 3.63860 0.90002 3725G 2126G 1599G 57.08 1.21 140 > osd.68 > 69 hdd 3.63860 1.00000 3725G 1883G 1841G 50.56 1.08 143 > osd.69 > 70 hdd 3.63860 1.00000 3725G 2217G 1508G 59.51 1.27 150 > osd.70 > 71 hdd 3.63860 0.95001 3725G 2179G 1546G 58.50 1.24 144 > osd.71 > -17 29.10876 - 29807G 16035G 13772G 53.80 1.14 - host > ark0002 > 56 hdd 3.63860 1.00000 3725G 1919G 1806G 51.50 1.10 131 > osd.56 > 57 hdd 3.63860 1.00000 3725G 2092G 1633G 56.16 1.19 138 > osd.57 > 58 hdd 3.63860 1.00000 3725G 2094G 1631G 56.22 1.20 126 > osd.58 > 59 hdd 3.63860 1.00000 3725G 1966G 1759G 52.78 1.12 128 > osd.59 > 60 hdd 3.63860 1.00000 3725G 2443G 1281G 65.59 1.40 133 > osd.60 > 61 hdd 3.63860 1.00000 3725G 2038G 1687G 54.72 1.16 151 > osd.61 > 62 hdd 3.63860 1.00000 3725G 1537G 2188G 41.26 0.88 131 > osd.62 > 63 hdd 3.63860 0.84999 3725G 1943G 1782G 52.15 1.11 126 > osd.63 > -3 29.10876 - 29807G 16368G 13438G 54.91 1.17 - host > ark0003 > 0 hdd 3.63860 1.00000 3725G 1925G 1800G 51.67 1.10 125 > osd.0 > 1 hdd 3.63860 1.00000 3725G 1806G 1919G 48.48 1.03 131 > osd.1 > 2 hdd 3.63860 0.84999 3725G 2018G 1707G 54.17 1.15 127 > osd.2 > 3 hdd 3.63860 1.00000 3725G 2272G 1453G 61.00 1.30 142 > osd.3 > 4 hdd 3.63860 1.00000 3725G 2313G 1412G 62.08 1.32 155 > osd.4 > 5 hdd 3.63860 1.00000 3725G 2144G 1581G 57.56 1.22 143 > osd.5 > 6 hdd 3.63860 1.00000 3725G 1755G 1970G 47.11 1.00 132 > osd.6 > 8 hdd 3.63860 1.00000 3725G 2132G 1593G 57.24 1.22 137 > osd.8 > -5 29.10876 - 29807G 16916G 12891G 56.75 1.21 - host > ark0004 > 7 hdd 3.63860 1.00000 3725G 2097G 1628G 56.29 1.20 146 > osd.7 > 9 hdd 3.63860 1.00000 3725G 2294G 1431G 61.59 1.31 138 > osd.9 > 10 hdd 3.63860 1.00000 3725G 1937G 1788G 52.00 1.11 123 > osd.10 > 11 hdd 3.63860 1.00000 3725G 2378G 1347G 63.82 1.36 150 > osd.11 > 12 hdd 3.63860 1.00000 3725G 1863G 1862G 50.01 1.06 126 > osd.12 > 13 hdd 3.63860 1.00000 3725G 2416G 1309G 64.86 1.38 156 > osd.13 > 14 hdd 3.63860 1.00000 3725G 2223G 1502G 59.68 1.27 161 > osd.14 > 15 hdd 3.63860 1.00000 3725G 1705G 2020G 45.76 0.97 129 > osd.15 > -7 29.10876 - 29807G 15622G 14184G 52.41 1.12 - host > ark0005 > 16 hdd 3.63860 1.00000 3725G 2135G 1590G 57.32 1.22 142 > osd.16 > 17 hdd 3.63860 1.00000 3725G 1741G 1984G 46.74 0.99 128 > osd.17 > 18 hdd 3.63860 1.00000 3725G 1724G 2000G 46.30 0.98 116 > osd.18 > 19 hdd 3.63860 1.00000 3725G 2392G 1333G 64.22 1.37 162 > osd.19 > 20 hdd 3.63860 1.00000 3725G 2190G 1534G 58.80 1.25 139 > osd.20 > 21 hdd 3.63860 1.00000 3725G 1886G 1839G 50.64 1.08 135 > osd.21 > 23 hdd 3.63860 1.00000 3725G 1968G 1757G 52.84 1.12 135 > osd.23 > 24 hdd 3.63860 1.00000 3725G 1581G 2144G 42.45 0.90 116 > osd.24 > -9 25.47017 - 11177G 6687G 4489G 59.83 1.27 - host > ark0006 > 22 hdd 3.63860 0 0 0 0 0 0 51 > osd.22 > 25 hdd 3.63860 0 0 0 0 0 0 40 > osd.25 > 26 hdd 3.63860 1.00000 3725G 2153G 1572G 57.80 1.23 138 > osd.26 > 27 hdd 3.63860 0 0 0 0 0 0 42 > osd.27 > 28 hdd 3.63860 0.84999 3725G 2413G 1312G 64.78 1.38 155 > osd.28 > 29 hdd 3.63860 0 0 0 0 0 0 52 > osd.29 > 32 hdd 3.63860 1.00000 3725G 2120G 1605G 56.91 1.21 138 > osd.32 > -11 25.47017 - 26081G 15146G 10935G 58.07 1.24 - host > ark0007 > 31 hdd 3.63860 1.00000 3725G 2284G 1441G 61.31 1.30 155 > osd.31 > 33 hdd 3.63860 1.00000 3725G 2020G 1705G 54.24 1.15 139 > osd.33 > 34 hdd 3.63860 1.00000 3725G 2453G 1272G 65.85 1.40 157 > osd.34 > 35 hdd 3.63860 1.00000 3725G 2389G 1336G 64.13 1.36 156 > osd.35 > 36 hdd 3.63860 1.00000 3725G 2149G 1575G 57.70 1.23 147 > osd.36 > 37 hdd 3.63860 1.00000 3725G 1718G 2007G 46.13 0.98 124 > osd.37 > 39 hdd 3.63860 1.00000 3725G 2129G 1596G 57.14 1.22 157 > osd.39 > -21 29.10876 - 29807G 16720G 13086G 56.10 1.19 - host > ark0008 > 72 hdd 3.63860 1.00000 3725G 2073G 1652G 55.66 1.18 143 > osd.72 > 73 hdd 3.63860 1.00000 3725G 1910G 1815G 51.26 1.09 137 > osd.73 > 74 hdd 3.63860 1.00000 3725G 2327G 1398G 62.48 1.33 158 > osd.74 > 75 hdd 3.63860 1.00000 3725G 2059G 1666G 55.27 1.18 137 > osd.75 > 76 hdd 3.63860 1.00000 3725G 2268G 1457G 60.89 1.30 150 > osd.76 > 77 hdd 3.63860 1.00000 3725G 2158G 1567G 57.94 1.23 137 > osd.77 > 78 hdd 3.63860 1.00000 3725G 1903G 1822G 51.09 1.09 136 > osd.78 > 79 hdd 3.63860 1.00000 3725G 2018G 1706G 54.19 1.15 135 > osd.79 > -13 29.10876 - 29807G 16465G 13342G 55.24 1.18 - host > ark0009 > 40 hdd 3.63860 1.00000 3725G 2112G 1613G 56.70 1.21 146 > osd.40 > 41 hdd 3.63860 1.00000 3725G 1883G 1842G 50.56 1.08 133 > osd.41 > 42 hdd 3.63860 1.00000 3725G 1945G 1780G 52.22 1.11 115 > osd.42 > 43 hdd 3.63860 1.00000 3725G 1971G 1754G 52.91 1.13 132 > osd.43 > 44 hdd 3.63860 1.00000 3725G 1892G 1833G 50.78 1.08 134 > osd.44 > 45 hdd 3.63860 1.00000 3725G 1884G 1841G 50.59 1.08 137 > osd.45 > 47 hdd 3.63860 1.00000 3725G 2248G 1477G 60.36 1.28 145 > osd.47 > 49 hdd 3.63860 1.00000 3725G 2525G 1200G 67.78 1.44 146 > osd.49 > -15 29.10876 - 29807G 17561G 12246G 58.92 1.25 - host > ark0010 > 46 hdd 3.63860 0.90002 3725G 2397G 1328G 64.35 1.37 131 > osd.46 > 48 hdd 3.63860 1.00000 3725G 2542G 1183G 68.24 1.45 164 > osd.48 > 50 hdd 3.63860 1.00000 3725G 2378G 1347G 63.84 1.36 144 > osd.50 > 51 hdd 3.63860 1.00000 3725G 2413G 1312G 64.78 1.38 151 > osd.51 > 52 hdd 3.63860 1.00000 3725G 1467G 2258G 39.39 0.84 136 > osd.52 > 53 hdd 3.63860 1.00000 3725G 2117G 1608G 56.84 1.21 132 > osd.53 > 54 hdd 3.63860 1.00000 3725G 2148G 1577G 57.66 1.23 140 > osd.54 > 55 hdd 3.63860 1.00000 3725G 2095G 1630G 56.23 1.20 146 > osd.55 > -35 29.10876 - 29807G 16582G 13224G 55.63 1.18 - host > ark0014 > 121 hdd 3.63860 1.00000 3725G 2116G 1609G 56.81 1.21 134 > osd.121 > 122 hdd 3.63860 1.00000 3725G 2114G 1611G 56.74 1.21 127 > osd.122 > 124 hdd 3.63860 1.00000 3725G 2043G 1682G 54.85 1.17 129 > osd.124 > 126 hdd 3.63860 1.00000 3725G 1913G 1812G 51.35 1.09 126 > osd.126 > 128 hdd 3.63860 1.00000 3725G 2004G 1721G 53.79 1.14 120 > osd.128 > 130 hdd 3.63860 0.84999 3725G 2180G 1544G 58.53 1.25 131 > osd.130 > 132 hdd 3.63860 1.00000 3725G 2109G 1616G 56.62 1.20 124 > osd.132 > 134 hdd 3.63860 1.00000 3725G 2099G 1626G 56.36 1.20 126 > osd.134 > -33 29.10876 - 29807G 16896G 12910G 56.69 1.21 - host > ark0015 > 120 hdd 3.63860 1.00000 3725G 2052G 1673G 55.07 1.17 134 > osd.120 > 123 hdd 3.63860 0.99994 3725G 2431G 1294G 65.27 1.39 151 > osd.123 > 125 hdd 3.63860 1.00000 3725G 2051G 1674G 55.06 1.17 124 > osd.125 > 127 hdd 3.63860 1.00000 3725G 2106G 1619G 56.53 1.20 137 > osd.127 > 129 hdd 3.63860 1.00000 3725G 1772G 1953G 47.57 1.01 136 > osd.129 > 131 hdd 3.63860 1.00000 3725G 2179G 1546G 58.49 1.24 149 > osd.131 > 133 hdd 3.63860 1.00000 3725G 2000G 1725G 53.69 1.14 156 > osd.133 > 135 hdd 3.63860 1.00000 3725G 2303G 1422G 61.81 1.32 162 > osd.135 > -23 29.10876 - 29807G 16551G 13255G 55.53 1.18 - host > ark0016 > 80 hdd 3.63860 1.00000 3725G 1946G 1779G 52.24 1.11 126 > osd.80 > 83 hdd 3.63860 1.00000 3725G 1807G 1918G 48.50 1.03 138 > osd.83 > 88 hdd 3.63860 1.00000 3725G 2295G 1430G 61.61 1.31 154 > osd.88 > 92 hdd 3.63860 1.00000 3725G 2249G 1476G 60.37 1.28 146 > osd.92 > 96 hdd 3.63860 1.00000 3725G 1947G 1778G 52.26 1.11 125 > osd.96 > 101 hdd 3.63860 1.00000 3725G 2391G 1333G 64.20 1.37 156 > osd.101 > 105 hdd 3.63860 1.00000 3725G 1961G 1764G 52.64 1.12 129 > osd.105 > 110 hdd 3.63860 1.00000 3725G 1952G 1772G 52.42 1.12 131 > osd.110 > -27 29.10876 - 29807G 15892G 13914G 53.32 1.13 - host > ark0017 > 81 hdd 3.63860 1.00000 3725G 2118G 1607G 56.85 1.21 134 > osd.81 > 85 hdd 3.63860 0.99995 3725G 2013G 1712G 54.04 1.15 138 > osd.85 > 90 hdd 3.63860 0.84998 3725G 2267G 1458G 60.87 1.29 145 > osd.90 > 94 hdd 3.63860 1.00000 3725G 1541G 2184G 41.37 0.88 117 > osd.94 > 100 hdd 3.63860 1.00000 3725G 1829G 1896G 49.10 1.04 127 > osd.100 > 103 hdd 3.63860 1.00000 3725G 1940G 1785G 52.08 1.11 138 > osd.103 > 109 hdd 3.63860 1.00000 3725G 1751G 1974G 47.00 1.00 116 > osd.109 > 112 hdd 3.63860 1.00000 3725G 2430G 1295G 65.23 1.39 158 > osd.112 > -31 29.10876 - 29807G 15454G 14353G 51.85 1.10 - host > ark0018 > 89 hdd 3.63860 1.00000 3725G 2291G 1434G 61.50 1.31 169 > osd.89 > 98 hdd 3.63860 1.00000 3725G 2048G 1677G 54.99 1.17 151 > osd.98 > 107 hdd 3.63860 1.00000 3725G 1678G 2047G 45.04 0.96 117 > osd.107 > 115 hdd 3.63860 1.00000 3725G 2043G 1682G 54.84 1.17 139 > osd.115 > 116 hdd 3.63860 1.00000 3725G 1741G 1984G 46.75 0.99 149 > osd.116 > 117 hdd 3.63860 1.00000 3725G 1726G 1999G 46.35 0.99 129 > osd.117 > 118 hdd 3.63860 1.00000 3725G 1937G 1788G 52.01 1.11 149 > osd.118 > 119 hdd 3.63860 1.00000 3725G 1986G 1739G 53.31 1.13 130 > osd.119 > -29 29.10876 - 29807G 15590G 14217G 52.30 1.11 - host > ark0019 > 84 hdd 3.63860 1.00000 3725G 1857G 1868G 49.84 1.06 119 > osd.84 > 86 hdd 3.63860 1.00000 3725G 1964G 1761G 52.72 1.12 142 > osd.86 > 93 hdd 3.63860 0.50000 3725G 1038G 2686G 27.89 0.59 72 > osd.93 > 97 hdd 3.63860 1.00000 3725G 2156G 1569G 57.88 1.23 141 > osd.97 > 102 hdd 3.63860 1.00000 3725G 1874G 1851G 50.31 1.07 129 > osd.102 > 106 hdd 3.63860 0.95001 3725G 2397G 1328G 64.34 1.37 148 > osd.106 > 111 hdd 3.63860 0.99995 3725G 2286G 1439G 61.37 1.31 139 > osd.111 > 114 hdd 3.63860 1.00000 3725G 2014G 1711G 54.07 1.15 145 > osd.114 > -25 29.10876 - 29807G 16555G 13251G 55.54 1.18 - host > ark0020 > 82 hdd 3.63860 1.00000 3725G 2172G 1553G 58.31 1.24 146 > osd.82 > 87 hdd 3.63860 1.00000 3725G 2424G 1301G 65.06 1.38 145 > osd.87 > 91 hdd 3.63860 1.00000 3725G 2254G 1471G 60.51 1.29 148 > osd.91 > 95 hdd 3.63860 1.00000 3725G 2076G 1649G 55.72 1.19 140 > osd.95 > 99 hdd 3.63860 1.00000 3725G 1706G 2019G 45.79 0.97 133 > osd.99 > 104 hdd 3.63860 1.00000 3725G 2127G 1598G 57.10 1.21 146 > osd.104 > 108 hdd 3.63860 1.00000 3725G 2075G 1650G 55.72 1.19 133 > osd.108 > 113 hdd 3.63860 1.00000 3725G 1719G 2006G 46.14 0.98 130 > osd.113 > -43 87.32764 - 89424G 34318G 55106G 38.38 0.82 - host > storage024 > 183 hdd 5.45798 1.00000 5589G 2516G 3072G 45.03 0.96 178 > osd.183 > 184 hdd 5.45798 1.00000 5589G 1852G 3736G 33.15 0.71 141 > osd.184 > 185 hdd 5.45798 1.00000 5589G 2024G 3564G 36.23 0.77 143 > osd.185 > 186 hdd 5.45798 1.00000 5589G 1979G 3609G 35.43 0.75 149 > osd.186 > 187 hdd 5.45798 1.00000 5589G 2256G 3332G 40.38 0.86 148 > osd.187 > 188 hdd 5.45798 1.00000 5589G 2365G 3223G 42.32 0.90 168 > osd.188 > 189 hdd 5.45798 1.00000 5589G 2220G 3368G 39.73 0.85 151 > osd.189 > 190 hdd 5.45798 1.00000 5589G 1956G 3632G 35.01 0.74 151 > osd.190 > 191 hdd 5.45798 1.00000 5589G 2156G 3432G 38.58 0.82 171 > osd.191 > 192 hdd 5.45798 1.00000 5589G 1891G 3697G 33.85 0.72 148 > osd.192 > 193 hdd 5.45798 1.00000 5589G 2360G 3228G 42.23 0.90 170 > osd.193 > 194 hdd 5.45798 1.00000 5589G 2319G 3269G 41.50 0.88 153 > osd.194 > 195 hdd 5.45798 1.00000 5589G 1928G 3660G 34.50 0.73 148 > osd.195 > 196 hdd 5.45798 1.00000 5589G 2355G 3233G 42.15 0.90 149 > osd.196 > 197 hdd 5.45798 1.00000 5589G 1996G 3593G 35.71 0.76 160 > osd.197 > 198 hdd 5.45798 1.00000 5589G 2136G 3452G 38.23 0.81 179 > osd.198 > -45 87.32764 - 89424G 34290G 55134G 38.35 0.82 - host > storage025 > 199 hdd 5.45798 1.00000 5589G 2226G 3362G 39.83 0.85 172 > osd.199 > 200 hdd 5.45798 1.00000 5589G 1864G 3724G 33.37 0.71 133 > osd.200 > 201 hdd 5.45798 1.00000 5589G 2065G 3523G 36.96 0.79 162 > osd.201 > 202 hdd 5.45798 1.00000 5589G 2385G 3203G 42.69 0.91 154 > osd.202 > 203 hdd 5.45798 1.00000 5589G 2239G 3349G 40.07 0.85 186 > osd.203 > 204 hdd 5.45798 1.00000 5589G 2085G 3503G 37.31 0.79 172 > osd.204 > 205 hdd 5.45798 1.00000 5589G 2478G 3110G 44.34 0.94 181 > osd.205 > 206 hdd 5.45798 1.00000 5589G 1815G 3773G 32.49 0.69 150 > osd.206 > 207 hdd 5.45798 1.00000 5589G 2159G 3429G 38.63 0.82 155 > osd.207 > 208 hdd 5.45798 1.00000 5589G 2004G 3584G 35.87 0.76 120 > osd.208 > 209 hdd 5.45798 1.00000 5589G 2350G 3238G 42.06 0.89 155 > osd.209 > 210 hdd 5.45798 1.00000 5589G 2136G 3452G 38.23 0.81 161 > osd.210 > 211 hdd 5.45798 1.00000 5589G 2409G 3179G 43.11 0.92 158 > osd.211 > 212 hdd 5.45798 1.00000 5589G 1975G 3613G 35.34 0.75 148 > osd.212 > 213 hdd 5.45798 1.00000 5589G 1989G 3599G 35.60 0.76 158 > osd.213 > 214 hdd 5.45798 1.00000 5589G 2102G 3486G 37.62 0.80 147 > osd.214 > -41 87.32764 - 89424G 33898G 55525G 37.91 0.81 - host > storage026 > 167 hdd 5.45798 1.00000 5589G 1847G 3741G 33.05 0.70 141 > osd.167 > 168 hdd 5.45798 1.00000 5589G 2341G 3247G 41.90 0.89 183 > osd.168 > 169 hdd 5.45798 1.00000 5589G 1763G 3825G 31.55 0.67 142 > osd.169 > 170 hdd 5.45798 1.00000 5589G 2147G 3441G 38.42 0.82 153 > osd.170 > 171 hdd 5.45798 1.00000 5589G 2306G 3282G 41.27 0.88 148 > osd.171 > 172 hdd 5.45798 1.00000 5589G 2135G 3453G 38.21 0.81 164 > osd.172 > 173 hdd 5.45798 1.00000 5589G 2308G 3280G 41.31 0.88 165 > osd.173 > 174 hdd 5.45798 1.00000 5589G 2045G 3543G 36.61 0.78 151 > osd.174 > 175 hdd 5.45798 1.00000 5589G 2116G 3472G 37.86 0.81 140 > osd.175 > 176 hdd 5.45798 1.00000 5589G 1632G 3956G 29.22 0.62 125 > osd.176 > 177 hdd 5.45798 1.00000 5589G 2380G 3208G 42.60 0.91 151 > osd.177 > 178 hdd 5.45798 1.00000 5589G 2339G 3249G 41.86 0.89 168 > osd.178 > 179 hdd 5.45798 1.00000 5589G 2223G 3365G 39.78 0.85 163 > osd.179 > 180 hdd 5.45798 1.00000 5589G 1996G 3592G 35.73 0.76 161 > osd.180 > 181 hdd 5.45798 1.00000 5589G 2130G 3458G 38.12 0.81 169 > osd.181 > 182 hdd 5.45798 1.00000 5589G 2182G 3406G 39.04 0.83 157 > osd.182 > -39 87.32764 - 89424G 32801G 56623G 36.68 0.78 - host > storage027 > 151 hdd 5.45798 1.00000 5589G 1816G 3772G 32.50 0.69 163 > osd.151 > 152 hdd 5.45798 1.00000 5589G 2097G 3491G 37.52 0.80 159 > osd.152 > 153 hdd 5.45798 1.00000 5589G 1911G 3677G 34.20 0.73 145 > osd.153 > 154 hdd 5.45798 1.00000 5589G 1741G 3847G 31.16 0.66 145 > osd.154 > 155 hdd 5.45798 1.00000 5589G 1979G 3609G 35.41 0.75 147 > osd.155 > 156 hdd 5.45798 1.00000 5589G 1864G 3724G 33.36 0.71 160 > osd.156 > 157 hdd 5.45798 1.00000 5589G 2054G 3534G 36.77 0.78 156 > osd.157 > 158 hdd 5.45798 1.00000 5589G 2825G 2763G 50.56 1.08 181 > osd.158 > 159 hdd 5.45798 1.00000 5589G 1982G 3606G 35.48 0.75 147 > osd.159 > 160 hdd 5.45798 1.00000 5589G 2115G 3473G 37.86 0.81 151 > osd.160 > 161 hdd 5.45798 1.00000 5589G 2166G 3422G 38.76 0.82 156 > osd.161 > 162 hdd 5.45798 1.00000 5589G 2121G 3467G 37.95 0.81 151 > osd.162 > 163 hdd 5.45798 1.00000 5589G 2107G 3481G 37.70 0.80 153 > osd.163 > 164 hdd 5.45798 1.00000 5589G 1897G 3691G 33.94 0.72 149 > osd.164 > 165 hdd 5.45798 1.00000 5589G 2127G 3461G 38.07 0.81 152 > osd.165 > 166 hdd 5.45798 1.00000 5589G 1991G 3597G 35.64 0.76 185 > osd.166 > -37 81.86966 - 83835G 32262G 51572G 38.48 0.82 - host > storage028 > 136 hdd 5.45798 1.00000 5589G 2139G 3449G 38.27 0.81 165 > osd.136 > 137 hdd 5.45798 1.00000 5589G 2125G 3463G 38.02 0.81 150 > osd.137 > 138 hdd 5.45798 1.00000 5589G 2208G 3380G 39.52 0.84 182 > osd.138 > 139 hdd 5.45798 1.00000 5589G 2608G 2980G 46.68 0.99 180 > osd.139 > 140 hdd 5.45798 1.00000 5589G 2086G 3502G 37.33 0.79 145 > osd.140 > 141 hdd 5.45798 1.00000 5589G 2220G 3368G 39.73 0.85 163 > osd.141 > 142 hdd 5.45798 1.00000 5589G 2284G 3304G 40.88 0.87 186 > osd.142 > 143 hdd 5.45798 1.00000 5589G 1868G 3720G 33.43 0.71 150 > osd.143 > 144 hdd 5.45798 1.00000 5589G 2090G 3498G 37.41 0.80 161 > osd.144 > 145 hdd 5.45798 1.00000 5589G 1964G 3624G 35.15 0.75 167 > osd.145 > 146 hdd 5.45798 1.00000 5589G 2201G 3387G 39.39 0.84 158 > osd.146 > 147 hdd 5.45798 1.00000 5589G 2250G 3338G 40.28 0.86 161 > osd.147 > 148 hdd 5.45798 1.00000 5589G 2050G 3538G 36.69 0.78 151 > osd.148 > 149 hdd 5.45798 1.00000 5589G 2210G 3378G 39.55 0.84 182 > osd.149 > 150 hdd 5.45798 1.00000 5589G 1951G 3637G 34.92 0.74 166 > osd.150 > TOTAL 904T 424T 479T 47.00 > MIN/MAX VAR: 0.59/1.45 STDDEV: 10.18 > > ------------------------------------ > # ceph versions > { > "mon": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 3 > }, > "mgr": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 4 > }, > "osd": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 68, > "ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) > luminous (stable)": 64, > "ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) > luminous (stable)": 2, > "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) > luminous (stable)": 79 > }, > "mds": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 1, > "ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) > luminous (stable)": 3 > }, > "rgw": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 1 > }, > "overall": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 77, > "ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) > luminous (stable)": 67, > "ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) > luminous (stable)": 2, > "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) > luminous (stable)": 79 > } > } > > On Sat, Sep 15, 2018 at 10:45 PM Paul Emmerich <paul.emmerich@xxxxxxxx> > wrote: >> >> Well, that's not a lot of information to troubleshoot such a problem. >> >> Please post the output of the following commands: >> >> * ceph -s >> * ceph health detail >> * ceph osd pool ls detail >> * ceph osd tree >> * ceph osd df tree >> * ceph versions >> >> And a description of what you did to upgrade it. >> >> Paul >> >> >> 2018-09-15 15:46 GMT+02:00 Frank Yu <flyxiaoyu@xxxxxxxxx>: >> > Hello there, >> > >> > I have a ceph cluster which increase from 400TB to 900 TB recently, now >> > the >> > cluster is in unhealthy status, there're about 1700+ pg in unclean >> > status >> > >> > # ceph pg dump_stuck unclean|wc >> > ok >> > 1696 10176 191648 >> > >> > the cephfs can't work anymore, the read io was no more than MB/s. >> > Is there any way to fix the unclean pg quickly? >> > >> > >> > >> > >> > -- >> > Regards >> > Frank Yu >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> >> >> >> -- >> Paul Emmerich >> >> Looking for help with your Ceph cluster? Contact us at https://croit.io >> >> croit GmbH >> Freseniusstr. 31h >> 81247 München >> www.croit.io >> Tel: +49 89 1896585 90 > > > > -- > Regards > Frank Yu -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com