I rebuilt the primary OSD (29) in the hopes it would unblock whatever it was, but no luck. I'll check the admin socket and see if there is anything I can find there. On Tue, Sep 30, 2014 at 10:36 AM, Gregory Farnum <greg at inktank.com> wrote: > On Tuesday, September 30, 2014, Robert LeBlanc <robert at leblancnet.us> > wrote: > >> On our dev cluster, I've got a PG that won't create. We had a host fail >> with 10 OSDs that needed to be rebuilt. A number of other OSDs were down >> for a few days (did I mention this was a dev cluster?). The other OSDs >> eventually came up once the OSD maps caught up on them. I rebuilt the OSDs >> on all the hosts because we were running into XFS lockups with bcache. >> There were a number of PGs that could not be found when all the hosts were >> rebuilt. I tried restarting all the OSDs, the MONs, and deep scrubbing the >> OSDs they were on as well as the PGs. I performed a repair on the OSDs as >> well without any luck. One of pools had a recommendation to increase the >> PGs, so I increased it thinking it might be able to help. >> >> Nothing was helping and I could not find any reference to them so I force >> created them. That cleared up all but one that is creating due to the new >> PG number. Now, there is nothing I can do to unstick this one PG, I can't >> force create it, I can't increase the pgp_num, nada. At one point when >> recreating the OSDs, some of the number got out of order and to calm my >> OCD, I "fixed" it requiring me to manually modify the CRUSH map as the OSD >> appeared in both hosts, this was before I increased the PGs. >> >> There is nothing critical on this cluster, but I'm using this as an >> opportunity to understand Ceph in case we run into something similar in our >> future production environment. >> >> HEALTH_WARN 1 pgs stuck inactive; 1 pgs stuck unclean; pool libvirt-pool >> pg_num 256 > pgp_num 128 >> pg 4.bf is stuck inactive since forever, current state creating, last >> acting [29,15,32] >> pg 4.bf is stuck unclean since forever, current state creating, last >> acting [29,15,32] >> pool libvirt-pool pg_num 256 > pgp_num 128 >> [root at nodea ~]# ceph-osd --version >> ceph version 0.85 (a0c22842db9eaee9840136784e94e50fabe77187) >> >> More output http://pastebin.com/ajgpU7Zx >> >> Thanks >> > > You should find out which OSD the PG maps to, and see if "ceph pg query" > or the osd admin socket will expose anything useful about its state. > -Greg > > > -- > Software Engineer #42 @ http://inktank.com | http://ceph.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140930/4a5e4b93/attachment.htm>