Hi,
We had a hardware failure of one node and when it came back we had one OSD 489 that is showing as live but is not taking IO, we stopped the OSD and changed the crush weight to 0, but then those two PGs moved to 2 different OSDs (490.492). This caused rebalancing and 2 PGs being stuck inactive and incomplete. We are on Hammer version 0.94.9. What is the best way to fix these 2 PGs. Tried PG repair and flush journal.
Also did, stop the osd 490 of pg 13.110c, then moved the PG directory to a back directory and started the OSD, this brought back the full PG data of 15G but still it is showing as inactive and incomplete.
What is the best way to deal with this inactive and incomplete PGs?
health HEALTH_WARN
24 pgs backfill
4 pgs backfilling
2 pgs incomplete
2 pgs stuck inactive
30 pgs stuck unclean
103 requests are blocked > 32 sec
recovery 12/240361644 objects degraded (0.000%)
recovery 151856/240361644 objects misplaced (0.063%)
monmap e3: 3 mons at {no1dra300=10.42.120.18:6789/0,no1dra301=10.42.120.10:6789/0,no1dra302=10.42.120.20:6789/0}
election epoch 180, quorum 0,1,2 no1dra301,no1dra300,no1dra302
osdmap e423459: 959 osds: 957 up, 957 in; 28 remapped pgs
pgmap v46862311: 39936 pgs, 12 pools, 342 TB data, 78217 kobjects
1032 TB used, 708 TB / 1740 TB avail
12/240361644 objects degraded (0.000%)
151856/240361644 objects misplaced (0.063%)
39905 active+clean
24 active+remapped+wait_backfill
4 active+remapped+backfilling
2 incomplete
1 active+clean+scrubbing+deep
recovery io 78207 kB/s, 19 objects/s
client io 26384 kB/s rd, 47708 kB/s wr, 7941 op/s
ceph health detail | grep -i incomplete
HEALTH_WARN 24 pgs backfill; 4 pgs backfilling; 2 pgs incomplete; 2 pgs stuck inactive; 30 pgs stuck unclean; 106 requests are blocked > 32 sec; 2 osds have slow requests; recovery 12/240361650 objects degraded (0.000%); recovery 151582/240361650 objects misplaced (0.063%)
pg 13.110c is stuck inactive since forever, current state incomplete, last acting [490,16,120]
pg 7.9b7 is stuck inactive since forever, current state incomplete, last acting [492,680,265]
pg 7.9b7 is stuck unclean since forever, current state incomplete, last acting [492,680,265]
pg 13.110c is stuck unclean since forever, current state incomplete, last acting [490,16,120]
pg 13.110c is incomplete, acting [490,16,120] (reducing pool volumes min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.9b7 is incomplete, acting [492,680,265] (reducing pool images min_size from 2 may help; search ceph.com/docs for 'incomplete')
HEALTH_WARN 24 pgs backfill; 4 pgs backfilling; 2 pgs incomplete; 2 pgs stuck inactive; 30 pgs stuck unclean; 115 requests are blocked > 32 sec; 2 osds have slow requests; recovery 12/240361653 objects degraded (0.000%); recovery 151527/240361653 objects misplaced (0.063%)
pg 13.110c is stuck inactive since forever, current state incomplete, last acting [490,16,120]
pg 7.9b7 is stuck inactive since forever, current state incomplete, last acting [492,680,265]
Thanks,
-- Pardhiv Karri
"Rise and Rise again until LAMBS become LIONS"
"Rise and Rise again until LAMBS become LIONS"
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com