Hi, I have an issue with a Ceph cluster which I can't resolve. Due to OSD failure a PG is incomplete, but I can't query the PG to see what I can do to fix it. health HEALTH_WARN 1 pgs incomplete 1 pgs stuck inactive 1 pgs stuck unclean 98 requests are blocked > 32 sec $ ceph pg 3.117 query That will hang for ever. $ ceph pg dump_stuck pg_stat state up up_primary acting acting_primary 3.117 incomplete [68,55,74] 68 [68,55,74] 68 The primary PG in this case is osd.68 . If I stop the OSD the PG query works, but it says that bringing osd 68 back online will probably help. The 98 requests which are blocked are also on osd.68 and they all say: They all say: - initiated - reached_pg The cluster is running Hammer 0.94.5 in this case. >From what I know a OSD had a failing disk and was restarted a couple of times while the disk gave errors. This caused the PG to become incomplete. I've set debug osd to 20, but I can't really tell what is going wrong on osd.68 which causes it to stall this long. Any idea what to do here to get this PG up and running again? Wido _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com