The guide on migrating from filestore to bluestore was perfect. I was able to get that OSD back up and running quickly. Thanks. As for my PGs, I tried force-create-pg and it said it was working on it for a while, and I saw some deep scrubs happening, but when they were done it didn't help the incomplete problem. However, the ceph-objectstore-tool seems to be working. For the people of the future (which might well be me if I mess things up again), here's the command I ran (from the node which hosts the OSD): ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-11 --pgid 2.0 --op mark-complete --no-mon-config Thanks for your help Alfredo & Paul. :-) --Adam On 6/27/19 11:05 AM, Alfredo Deza wrote: > > > On Thu, Jun 27, 2019 at 10:36 AM ☣Adam <adam@xxxxxxxxx > <mailto:adam@xxxxxxxxx>> wrote: > > Well that caused some excitement (either that or the small power > disruption did)! One of my OSDs is now down because it keeps crashing > due to a failed assert (stacktraces attached, also I'm apparently > running mimic, not luminous). > > In the past a failed assert on an OSD has meant removing the disk, > wiping it, re-adding it as a new one, and then have ceph rebuild it from > other copies of the data. > > I did this all manually in the past, but I'm trying to get more familiar > with ceph's commands. Will the following commands do the same? > > ceph-volume lvm zap --destroy --osd-id 11 > # Presumably that has to be run from the node with OSD 11, not just > # any ceph node? > # Source: http://docs.ceph.com/docs/mimic/ceph-volume/lvm/zap > > > That looks correct, and yes, you would need to run on the node with OSD 11. > > > > Do I need to remove the OSD (ceph osd out 11; wait for stabilization; > ceph osd purge 11) before I do this and run and "ceph-deploy osd create" > afterwards? > > > I think that what you need es essentially the same as the guide for > migrating from filestore to bluestore: > > http://docs.ceph.com/docs/mimic/rados/operations/bluestore-migration/ > > > Thanks, > Adam > > > On 6/26/19 6:35 AM, Paul Emmerich wrote: > > Have you tried: ceph osd force-create-pg <pgid>? > > > > If that doesn't work: use objectstore-tool on the OSD (while it's not > > running) and use it to force mark the PG as complete. (Don't know the > > exact command off the top of my head) > > > > Caution: these are obviously really dangerous commands > > > > > > > > Paul > > > > > > > > -- > > Paul Emmerich > > > > Looking for help with your Ceph cluster? Contact us at > https://croit.io > > > > croit GmbH > > Freseniusstr. 31h > > 81247 München > > www.croit.io <http://www.croit.io> <http://www.croit.io> > > Tel: +49 89 1896585 90 > > > > > > On Wed, Jun 26, 2019 at 1:56 AM ☣Adam <adam@xxxxxxxxx > <mailto:adam@xxxxxxxxx> > > <mailto:adam@xxxxxxxxx <mailto:adam@xxxxxxxxx>>> wrote: > > > > How can I tell ceph to give up on "incomplete" PGs? > > > > I have 12 pgs which are "inactive, incomplete" that won't > recover. I > > think this is because in the past I have carelessly pulled > disks too > > quickly without letting the system recover. I suspect the > disks that > > have the data for these are long gone. > > > > Whatever the reason, I want to fix it so I have a clean cluser > even if > > that means losing data. > > > > I went through the "troubleshooting pgs" guide[1] which is > excellent, > > but didn't get me to a fix. > > > > The output of `ceph pg 2.0 query` includes this: > > "recovery_state": [ > > { > > "name": "Started/Primary/Peering/Incomplete", > > "enter_time": "2019-06-25 18:35:20.306634", > > "comment": "not enough complete instances of this PG" > > }, > > > > I've already restated all OSDs in various orders, and I > changed min_size > > to 1 to see if that would allow them to get fixed, but no such > luck. > > These pools are not erasure coded and I'm using the Luminous > release. > > > > How can I tell ceph to give up on these PGs? There's nothing > identified > > as unfound, so mark_unfound_lost doesn't help. I feel like > `ceph osd > > lost` might be it, but at this point the OSD numbers have been > reused > > for new disks, so I'd really like to limit the damage to the > 12 PGs > > which are incomplete if possible. > > > > Thanks, > > Adam > > > > [1] > > > http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/ > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > <mailto:ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com