Ahh figured it out. I hadn't removed the dead OSDs from the crush map, which was apparently confusing ceph. I just did 'ceph osd crush rm XXX' for all of them, restarted all the online OSDs, and the pg got created! On 8/8/2014 4:51 PM, Brian Rak wrote: > ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f) > > I recently managed to cause some problems for one of our clusters, we > had 1/3 of the OSDs fail and lose all the data. > > I removed all the failed OSDs from the crush map, and did 'ceph osd > rm'. Once it finished recovering, I was left with a whole bunch of > 'stale+active+clean' PGs. These had been hosted entirely on the OSDs > that failed. > > So, there will be some data loss here. Luckily the majority of the > data is easily replaceable. I couldn't do a whole lot with these PGs, > so I ended up forcing ceph to recreate them, with: > > ceph health detail | grep pg | awk '{ print $2 }' | xargs -n1 ceph pg > force_create_pg > > This fixed most of them, though I'm now left with one that's hanging > on 'creating'. Any suggestions for what I can do? There isn't any > data to lose in this pg, so I would be okay removing it, but I don't > see any way to do that. How can I force the OSD to create it again? > > cluster e312b58c-0391-43d0-98e6-25a41bea6a70 > health HEALTH_WARN 1 pgs stuck inactive; 1 pgs stuck unclean > monmap e3: 3 mons at {snip}, election epoch 50, quorum 0,1,2 {snip} > osdmap e3922: 11 osds: 11 up, 11 in > pgmap v1261502: 4722 pgs, 14 pools, 4344 GB data, 3314 kobjects > 8668 GB used, 11803 GB / 20472 GB avail > 1 creating > 4721 active+clean > client io 449 kB/s rd, 0 B/s wr, 643 op/s > > # ceph pg dump | grep creating > dumped all in format plain > 3.15c 0 0 0 0 0 0 0 > creating 2014-08-08 16:18:38.781245 0'0 0:0 [4,2] > 4 [2,4] 2 0'0 0.000000 0'0 0.000000 > > # ceph pg 3.15c query > Error ENOENT: i don't have pgid 3.15c > > # ceph pg 3.15c mark_unfound_lost revert > Error ENOENT: i don't have pgid 3.15c > > If I try to force a scrub: > > 2014-08-08 16:41:38.016388 7f33270cd700 0 osd.2 3926 do_command r=0 > 2014-08-08 16:41:39.775253 7f33270cd700 0 osd.2 3926 do_command r=0 > 2014-08-08 16:41:42.491501 7f33270cd700 0 osd.2 3926 do_command r=0 > 2014-08-08 16:41:42.497906 7f33270cd700 0 osd.2 3926 do_command r=-2 > i don't have pgid 3.15c > 2014-08-08 16:41:42.497911 7f33270cd700 0 log [INF] : i don't have > pgid 3.15c > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com