On Tuesday, September 30, 2014, Robert LeBlanc <robert at leblancnet.us> wrote: > On our dev cluster, I've got a PG that won't create. We had a host fail > with 10 OSDs that needed to be rebuilt. A number of other OSDs were down > for a few days (did I mention this was a dev cluster?). The other OSDs > eventually came up once the OSD maps caught up on them. I rebuilt the OSDs > on all the hosts because we were running into XFS lockups with bcache. > There were a number of PGs that could not be found when all the hosts were > rebuilt. I tried restarting all the OSDs, the MONs, and deep scrubbing the > OSDs they were on as well as the PGs. I performed a repair on the OSDs as > well without any luck. One of pools had a recommendation to increase the > PGs, so I increased it thinking it might be able to help. > > Nothing was helping and I could not find any reference to them so I force > created them. That cleared up all but one that is creating due to the new > PG number. Now, there is nothing I can do to unstick this one PG, I can't > force create it, I can't increase the pgp_num, nada. At one point when > recreating the OSDs, some of the number got out of order and to calm my > OCD, I "fixed" it requiring me to manually modify the CRUSH map as the OSD > appeared in both hosts, this was before I increased the PGs. > > There is nothing critical on this cluster, but I'm using this as an > opportunity to understand Ceph in case we run into something similar in our > future production environment. > > HEALTH_WARN 1 pgs stuck inactive; 1 pgs stuck unclean; pool libvirt-pool > pg_num 256 > pgp_num 128 > pg 4.bf is stuck inactive since forever, current state creating, last > acting [29,15,32] > pg 4.bf is stuck unclean since forever, current state creating, last > acting [29,15,32] > pool libvirt-pool pg_num 256 > pgp_num 128 > [root at nodea ~]# ceph-osd --version > ceph version 0.85 (a0c22842db9eaee9840136784e94e50fabe77187) > > More output http://pastebin.com/ajgpU7Zx > > Thanks > You should find out which OSD the PG maps to, and see if "ceph pg query" or the osd admin socket will expose anything useful about its state. -Greg -- Software Engineer #42 @ http://inktank.com | http://ceph.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140930/c2af8527/attachment.htm>