Hi Eugen,
Am 26.08.2020 um 11:47 schrieb Eugen Block:
I don't know if the ceph version is relevant here but I could undo that
quite quickly in my small test cluster (Octopus native, no docker).
After the OSD was marked as "destroyed" I recreated the auth caps for
that OSD_ID (marking as destroyed removes cephx keys etc.), changed the
keyring in /var/lib/ceph/osd/ceph-1/keyring to reflect that and
restarted the OSD, now it's up and in again. Is the OSD in your case
actually up and running?
my cluster is running Octopus too, and your hint regarding the auth caps
put me on the right track to get the OSD back online.
For anyone who ends up in the same situation, here is what I did
(assuming osd.95 is the destroyed OSD and only `ceph osd destroy ...`
was invoked, no `ceph-volume lvm zap ...`). Commands should be run on
the node where the destroyed OSD resides:
1. I made a copy of the keyring file of the OSD:
cp /var/lib/ceph/osd/ceph-95/keyring ~/keyring.osd.95
2. Add the following "caps" lines to the copy ~/keyring.osd.95, so that
the file looks like this in the end (leave your key intact):
[osd.95]
key = <osd-key>
caps mgr = "allow profile osd"
caps mon = "allow profile osd"
caps osd = "allow *"
3. Now reimport the OSD keyring file:
ceph auth import -i ~/keyring.osd.95
4. Create new OSD, replacing the destroyed one:
ceph osd new $(cat /var/lib/ceph/osd/ceph-95/fsid) 95
5. Start the OSD again:
systemctl start ceph-osd@95.service
Now the OSD should rejoin the cluster and everything should be back to
normal. At least it did for me, fixing my "incomplete PG" issue.
Regards,
Michael
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx