Forgot to mention Ceph version - 0.94.5.
I managed to fix this. By chance I found that when an OSD for a blocked PG is starting, there is a few-second time window (after load_pgs) in which it accepts commands related to the blocked PG. So first I managed to capture "ceph pg PGID query" this way. Then I tried to issue "ceph pg missing_lost delete" and it worked too. After deleting all unfound objects this way cluster finally unblocked. Before that I exported all blocked PGs so hopefully I will be able to recover those 17 objects to a near-latest state.
I managed to fix this. By chance I found that when an OSD for a blocked PG is starting, there is a few-second time window (after load_pgs) in which it accepts commands related to the blocked PG. So first I managed to capture "ceph pg PGID query" this way. Then I tried to issue "ceph pg missing_lost delete" and it worked too. After deleting all unfound objects this way cluster finally unblocked. Before that I exported all blocked PGs so hopefully I will be able to recover those 17 objects to a near-latest state.
Hope this helps anyone who might run into the same problem.
2016-10-01 14:27 GMT+02:00 Tomasz Kuzemko <tomasz@xxxxxxxxxxx>:
Hi,I have a production cluster on which 1 OSD on a failing disk was slowing the whole cluster down. I removed the OSD (osd.87) like usual in such case but this time it resulted in 17 unfound objects. I no longer have the files from osd.87. I was able to call "ceph pg PGID mark_unfound_lost delete" on 10 of those objects.
On the remaining objects 7 the command blocks. When I try to do "ceph pg PGID query" on this PG it also blocks. I suspect this is same reason why mark_unfound blocks.
Other client IO to PGs that have unfound objects are also blocked. When trying to query the OSDs which has the PG with unfound objects, "ceph tell" blocks.
I tried to mark the PG as complete using ceph-objectstore-tool but it did not help as the PG is in fact complete but for some reason blocks.I tried recreating an empty osd.87 and importing the PG exported from other replica but it did not help.Can someone help me please? This is really important.
--
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com