On Thu, Jun 30, 2016 at 11:34 PM, Brian Felton <bjfelton@xxxxxxxxx> wrote: > Sure. Here's a complete query dump of one of the 30 pgs: > http://pastebin.com/NFSYTbUP Looking at that something immediately stands out. There are a lot of entries in "past intervals" like so. "past_intervals": [ { "first": 18522, "last": 18523, "maybe_went_rw": 1, "up": [ 2147483647, ... "acting": [ 2147483647, 2147483647, 2147483647, 2147483647 ], "primary": -1, "up_primary": -1 That value is defined in src/crush/crush.h like so; #define CRUSH_ITEM_NONE 0x7fffffff /* no result */ So it looks like this could be to do with a bad crush rule (or at least a previously un-satisfiable rule). Could you share the output from the following? $ ceph osd crush rule ls For each rule listed by the above command. $ ceph osd crush rule dump [rule_name] I'd then dump out the crushmap and test it showing any bad mappings with the commands listed here; http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon That should hopefully give some insight. HTH, Brad > > Brian > > On Wed, Jun 29, 2016 at 6:25 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: >> >> On Thu, Jun 30, 2016 at 3:22 AM, Brian Felton <bjfelton@xxxxxxxxx> wrote: >> > Greetings, >> > >> > I have a lab cluster running Hammer 0.94.6 and being used exclusively >> > for >> > object storage. The cluster consists of four servers running 60 6TB >> > OSDs >> > each. The main .rgw.buckets pool is using k=3 m=1 erasure coding and >> > contains 8192 placement groups. >> > >> > Last week, one of our guys out-ed and removed one OSD from each of three >> > of >> > the four servers in the cluster, which resulted in some general badness >> > (the >> > disks were wiped post-removal, so the data are gone). After a proper >> > education in why this is a Bad Thing, we got the OSDs added back. When >> > all >> > was said and done, we had 30 pgs that were stuck incomplete, and no >> > amount >> > of magic has been able to get them to recover. From reviewing the data, >> > we >> > knew that all of these pgs contained at least 2 of the removed OSDs; I >> > understand and accept that the data are gone, and that's not a concern >> > (yay >> > lab). >> > >> > Here are the things I've tried: >> > >> > - Restarted all OSDs >> > - Stopped all OSDs, removed all OSDs from the crush map, and started >> > everything back up >> > - Executed a 'ceph pg force_create_pg <id>' for each of the 30 stuck pgs >> > - Executed a 'ceph pg send_pg_creates' to get the ball rolling on >> > creates >> > - Executed several 'ceph pg <id> query' commands to ensure we were >> > referencing valid OSDs after the 'force_create_pg' >> > - Ensured those OSDs were really removed (e.g. 'ceph auth del', 'ceph >> > osd >> > crush remove', and 'ceph osd rm') >> >> Can you share some of the pg query output? >> >> > >> > At this point, I've got the same 30 pgs that are stuck creating. I've >> > run >> > out of ideas for getting this back to a healthy state. In reviewing the >> > other posts on the mailing list, the overwhelming solution was a bad OSD >> > in >> > the crush map, but I'm all but certain that isn't what's hitting us >> > here. >> > Normally, being the lab, I'd consider nuking the .rgw.buckets pool and >> > starting from scratch, but we've recently spent a lot of time pulling >> > 140TB >> > of data into this cluster for some performance and recovery tests, and >> > I'd >> > prefer not to have to start that process again. I am willing to >> > entertain >> > most any other idea irrespective to how destructive it is to these PGs, >> > so >> > long as I don't have to lose the rest of the data in the pool. >> > >> > Many thanks in advance for any assistance here. >> > >> > Brian Felton >> > >> > >> > >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> >> >> >> -- >> Cheers, >> Brad > > -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com