Hello, We recently had 2 nodes go down in our ceph cluster, one was repaired and the other had all 12 osds destroyed when it went down. We brought everything back online, there were several PGs that were showing as down+peering as well as down.
After marking the failed OSDs as lost and removing them from the cluster we now have around 90 PGs that are showing as incomplete. At this point we just want to get the cluster back up and in a healthy state. I tried recreating the PGs using force_create_pg
and now they are all stuck in creating. PG dump shows 90 pgs all with the same output 2.182 0 0 0 0 0 0 0 0 creating 2015-10-14 10:31:28.832527 0'0 0:0 [] -1 [] -1 0'0 0.000000 0'0 0.000000 When I ran pg query on one of the groups I noticed under "down_osds_we_would_probe" one of the failed OSDs was listed. I already removed the OSD from the cluster, trying to mark it lost says the OSD does not exist.
Here is my crushmap
http://pastebin.com/raw.php?i=vyk9vMT1 Why are the PGs trying to query osds that have been lost and removed from the cluster?
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com