Hi everyone. Yesterday i found that on
our overcrowded Hammer ceph cluster (83% used in HDD
pool) several osds were in danger zone - near 95%.
I reweighted them, and after several moments I got
pgs stuck in backfill_toofull.
Currently, all reweights are equal 1.0, and ceph do
nothing - no rebalance and recovering.
ID WEIGHT REWEIGHT SIZE USE AVAIL %USE
VAR TYPE NAME
-1 30.65996 - 37970G 29370G 8599G 77.35
1.00 root default
-6 18.65996 - 20100G 16681G 3419G 82.99
1.07 region HDD
-3 6.09000 - 6700G 5539G 1160G 82.68
1.07 host ceph03.HDD
1 1.00000 1.00000 1116G 841G 274G 75.39
0.97 osd.1
5 1.00000 1.00000 1116G 916G 200G 82.07
1.06 osd.5
3 1.00000 1.00000 1116G 939G 177G 84.14
1.09 osd.3
8 1.09000 1.00000 1116G 952G 164G 85.29
1.10 osd.8
7 1.00000 1.00000 1116G 972G 143G 87.11
1.13 osd.7
11 1.00000 1.00000 1116G 916G 200G 82.08
1.06 osd.11
-4 6.16998 - 6700G 5612G 1088G 83.76
1.08 host ceph02.HDD
14 1.09000 1.00000 1116G 950G 165G 85.16
1.10 osd.14
13 0.89999 1.00000 1116G 949G 167G 85.03
1.10 osd.13
16 1.09000 1.00000 1116G 921G 195G 82.50
1.07 osd.16
17 1.00000 1.00000 1116G 899G 216G 80.59
1.04 osd.17
10 1.09000 1.00000 1116G 952G 164G 85.28
1.10 osd.10
15 1.00000 1.00000 1116G 938G 178G 84.02
1.09 osd.15
-2 6.39998 - 6700G 5529G 1170G 82.53
1.07 host ceph01.HDD
12 1.09000 1.00000 1116G 953G 163G 85.39
1.10 osd.12
9 0.95000 1.00000 1116G 939G 177G 84.14
1.09 osd.9
2 1.09000 1.00000 1116G 911G 204G 81.64
1.06 osd.2
0 1.09000 1.00000 1116G 951G 165G 85.22
1.10 osd.0
6 1.09000 1.00000 1116G 917G 199G 82.12
1.06 osd.6
4 1.09000 1.00000 1116G 856G 260G 76.67
0.99 osd.4