Glad to hear that you were able to recover the OSD!
Zitat von Michael Fladischer <michael@xxxxxxxx>:
Hi Eugen,
Am 26.08.2020 um 11:47 schrieb Eugen Block:
I don't know if the ceph version is relevant here but I could undo
that quite quickly in my small test cluster (Octopus native, no
docker).
After the OSD was marked as "destroyed" I recreated the auth caps
for that OSD_ID (marking as destroyed removes cephx keys etc.),
changed the keyring in /var/lib/ceph/osd/ceph-1/keyring to reflect
that and restarted the OSD, now it's up and in again. Is the OSD in
your case actually up and running?
my cluster is running Octopus too, and your hint regarding the auth
caps put me on the right track to get the OSD back online.
For anyone who ends up in the same situation, here is what I did
(assuming osd.95 is the destroyed OSD and only `ceph osd destroy
...` was invoked, no `ceph-volume lvm zap ...`). Commands should be
run on the node where the destroyed OSD resides:
1. I made a copy of the keyring file of the OSD:
cp /var/lib/ceph/osd/ceph-95/keyring ~/keyring.osd.95
2. Add the following "caps" lines to the copy ~/keyring.osd.95, so
that the file looks like this in the end (leave your key intact):
[osd.95]
key = <osd-key>
caps mgr = "allow profile osd"
caps mon = "allow profile osd"
caps osd = "allow *"
3. Now reimport the OSD keyring file:
ceph auth import -i ~/keyring.osd.95
4. Create new OSD, replacing the destroyed one:
ceph osd new $(cat /var/lib/ceph/osd/ceph-95/fsid) 95
5. Start the OSD again:
systemctl start ceph-osd@95.service
Now the OSD should rejoin the cluster and everything should be back
to normal. At least it did for me, fixing my "incomplete PG" issue.
Regards,
Michael
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx