Hi, Sort-of. we've force-created all PGs, set osd_find_best_info_ignore_history_les for every OSD, restarted all OSDs, ran pg-repairs for everything that was still broken and then unset osd_find_best_info_ignore_history_les and restarted the OSDs again. Worked for us, caused total data loss for every affected PG that did not have 0 Objects. That was acceptable as it's a test-setup. I would not want to do this live. Cheers, Hartwig Hauschild Am 07.02.2020 schrieb Ragan, Tj (Dr.): > Hi Hartwig, > > I’m in a similar situation, did you ever get this fixed? > > -TJ > > > > On 29 Jan 2020, at 11:16, Hartwig Hauschild <ml-ceph@xxxxxxxxxxxx<mailto:ml-ceph@xxxxxxxxxxxx>> wrote: > > Hi, > > I had looked at the output of `ceph health detail` which told me to search > for 'incomplete' in the docs. > Since that said to file a bug (and I was sure that filing a bug did not > help) I continued to purge the Disks that we hat overwritten and ceph then > did some magic and told me that the PGs were again available on three OSDs > but were incomplete. > > I have now gone ahead and marked all three of the OSDs where one of my > incomplete PGs is (according to `ceph pg ls incomplete`) as lost one by > one, waiting for ceph status to settle in between and that lead to the PG > now being incomplete on three different OSDs. > Also, force-create-pg tells me "already created". > > > Am 29.01.2020 schrieb Gregory Farnum: > There should be docs on how to mark an OSD lost, which I would expect to be > linked from the troubleshooting PGs page. > > There is also a command to force create PGs but I don’t think that will > help in this case since you already have at least one copy. > > On Tue, Jan 28, 2020 at 5:15 PM Hartwig Hauschild <ml-ceph@xxxxxxxxxxxx<mailto:ml-ceph@xxxxxxxxxxxx>> > wrote: > > Hi. > > before I descend into what happened and why it happened: I'm talking about > a > test-cluster so I don't really care about the data in this case. > > We've recently started upgrading from luminous to nautilus, and for us that > means we're retiring ceph-disk in favour of ceph-volume with lvm and > dmcrypt. > > Our setup is in containers and we've got DBs separated from Data. > When testing our upgrade-path we discovered that running the host on > ubuntu-xenial and the containers on centos-7.7 leads to lvm inside the > containers not using lvmetad because it's too old. That in turn means that > not running `vgscan --cache` on the host before adding a LV to a VG > essentially zeros the metadata for all LVs in that VG. > > That happened on two out of three hosts for a bunch of OSDs and those OSDs > are gone. I have no way of getting them back, they've been overwritten > multiple times trying to figure out what went wrong. > > So now I have a cluster that's got 16 pgs in 'incomplete', 14 of them with > 0 > objects, 2 with about 150 objects each. > > I have found a couple of howtos that tell me to use ceph-objectstore-tool > to > find the pgs on the active osds and I've given that a try, but > ceph-objectstore-tool always tells me it can't find the pg I am looking > for. > > Can I tell ceph to re-init the pgs? Do I have to delete the pools and > recreate them? > > There's no data I can't get back in there, I just don't feel like > scrapping and redeploying the whole cluster. > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx