Hi everyone. Yesterday i found that on our overcrowded Hammer ceph cluster (83% used in HDD pool) several osds were in danger zone - near 95%.
I reweighted them, and after several moments I got pgs stuck in backfill_toofull.
After that, I reapplied reweight to osds - no luck.
Currently, all reweights are equal 1.0, and ceph do nothing - no rebalance and recovering.
How I can make ceph recover these pgs?
ceph -s
health HEALTH_WARN
47 pgs backfill_toofull
47 pgs stuck unclean
recovery 16/9422472 objects degraded (0.000%)
recovery 365332/9422472 objects misplaced (3.877%)
7 near full osd(s)
ceph osd df tree
ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR TYPE NAME
-1 30.65996 - 37970G 29370G 8599G 77.35 1.00 root default
-6 18.65996 - 20100G 16681G 3419G 82.99 1.07 region HDD
-3 6.09000 - 6700G 5539G 1160G 82.68 1.07 host ceph03.HDD
1 1.00000 1.00000 1116G 841G 274G 75.39 0.97 osd.1
5 1.00000 1.00000 1116G 916G 200G 82.07 1.06 osd.5
3 1.00000 1.00000 1116G 939G 177G 84.14 1.09 osd.3
8 1.09000 1.00000 1116G 952G 164G 85.29 1.10 osd.8
7 1.00000 1.00000 1116G 972G 143G 87.11 1.13 osd.7
11 1.00000 1.00000 1116G 916G 200G 82.08 1.06 osd.11
-4 6.16998 - 6700G 5612G 1088G 83.76 1.08 host ceph02.HDD
14 1.09000 1.00000 1116G 950G 165G 85.16 1.10 osd.14
13 0.89999 1.00000 1116G 949G 167G 85.03 1.10 osd.13
16 1.09000 1.00000 1116G 921G 195G 82.50 1.07 osd.16
17 1.00000 1.00000 1116G 899G 216G 80.59 1.04 osd.17
10 1.09000 1.00000 1116G 952G 164G 85.28 1.10 osd.10
15 1.00000 1.00000 1116G 938G 178G 84.02 1.09 osd.15
-2 6.39998 - 6700G 5529G 1170G 82.53 1.07 host ceph01.HDD
12 1.09000 1.00000 1116G 953G 163G 85.39 1.10 osd.12
9 0.95000 1.00000 1116G 939G 177G 84.14 1.09 osd.9
2 1.09000 1.00000 1116G 911G 204G 81.64 1.06 osd.2
0 1.09000 1.00000 1116G 951G 165G 85.22 1.10 osd.0
6 1.09000 1.00000 1116G 917G 199G 82.12 1.06 osd.6
4 1.09000 1.00000 1116G 856G 260G 76.67 0.99 osd.4
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com