On our dev cluster, I've got a PG that won't create. We had a host fail with 10 OSDs that needed to be rebuilt. A number of other OSDs were down for a few days (did I mention this was a dev cluster?). The other OSDs eventually came up once the OSD maps caught up on them. I rebuilt the OSDs on all the hosts because we were running into XFS lockups with bcache. There were a number of PGs that could not be found when all the hosts were rebuilt. I tried restarting all the OSDs, the MONs, and deep scrubbing the OSDs they were on as well as the PGs. I performed a repair on the OSDs as well without any luck. One of pools had a recommendation to increase the PGs, so I increased it thinking it might be able to help. Nothing was helping and I could not find any reference to them so I force created them. That cleared up all but one that is creating due to the new PG number. Now, there is nothing I can do to unstick this one PG, I can't force create it, I can't increase the pgp_num, nada. At one point when recreating the OSDs, some of the number got out of order and to calm my OCD, I "fixed" it requiring me to manually modify the CRUSH map as the OSD appeared in both hosts, this was before I increased the PGs. There is nothing critical on this cluster, but I'm using this as an opportunity to understand Ceph in case we run into something similar in our future production environment. HEALTH_WARN 1 pgs stuck inactive; 1 pgs stuck unclean; pool libvirt-pool pg_num 256 > pgp_num 128 pg 4.bf is stuck inactive since forever, current state creating, last acting [29,15,32] pg 4.bf is stuck unclean since forever, current state creating, last acting [29,15,32] pool libvirt-pool pg_num 256 > pgp_num 128 [root at nodea ~]# ceph-osd --version ceph version 0.85 (a0c22842db9eaee9840136784e94e50fabe77187) More output http://pastebin.com/ajgpU7Zx Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140930/431c1f73/attachment.htm>