PG stuck creating

robert@xxxxxxxxxxxxx (Robert LeBlanc) · Tue, 30 Sep 2014 10:24:38 -0600

On our dev cluster, I've got a PG that won't create. We had a host fail
with 10 OSDs that needed to be rebuilt. A number of other OSDs were down
for a few days (did I mention this was a dev cluster?). The other OSDs
eventually came up once the OSD maps caught up on them. I rebuilt the OSDs
on all the hosts because we were running into XFS lockups with bcache.
There were a number of PGs that could not be found when all the hosts were
rebuilt. I tried restarting all the OSDs, the MONs, and deep scrubbing the
OSDs they were on as well as the PGs. I performed a repair on the OSDs as
well without any luck. One of pools had a recommendation to increase the
PGs, so I increased it thinking it might be able to help.

Nothing was helping and I could not find any reference to them so I force
created them. That cleared up all but one that is creating due to the new
PG number. Now, there is nothing I can do to unstick this one PG, I can't
force create it, I can't increase the pgp_num, nada. At one point when
recreating the OSDs, some of the number got out of order and to calm my
OCD, I "fixed" it requiring me to manually modify the CRUSH map as the OSD
appeared in both hosts, this was before I increased the PGs.

There is nothing critical on this cluster, but I'm using this as an
opportunity to understand Ceph in case we run into something similar in our
future production environment.

HEALTH_WARN 1 pgs stuck inactive; 1 pgs stuck unclean; pool libvirt-pool
pg_num 256 > pgp_num 128
pg 4.bf is stuck inactive since forever, current state creating, last
acting [29,15,32]
pg 4.bf is stuck unclean since forever, current state creating, last acting
[29,15,32]
pool libvirt-pool pg_num 256 > pgp_num 128
[root at nodea ~]# ceph-osd --version
ceph version 0.85 (a0c22842db9eaee9840136784e94e50fabe77187)

More output http://pastebin.com/ajgpU7Zx

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140930/431c1f73/attachment.htm>