Re: Accidentally created systemd units for OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

When things settle down, I *MIGHT* put in a RFE to change the default for ceph-volume to --no-systemd to save someone else from this anguish.

note that there are still users/operators/admins who don't use containers. Changing the ceph-volume default might not be the best idea in this case.

Regarding the cleanup, this was the thread [1] Tim was referring to. I would set the noout flag, stop an OSD (so the device won't be busy anymore), make sure that both ceph-osd@{OSD_ID} and ceph-{FSID}@osd.{OSD_ID} then double check that everything you need is still under /var/lib/ceph/{FSID}/osd.{OSD_ID}, like configs an keyrings. Disable the ceph-osd@{OSD_ID} (as already pointed out), then check if the orchestrator can start the OSD via systemd:

ceph orch daemon start osd.{OSD_ID}

or alternatively, try it manually:

systemctl reset-failed
systemctl start ceph-{FSID}@osd.{OSD_ID}

Watch the log for that OSD to identify any issues. If it works, unset the noout flag. You might want to ensure it also works after a reboot, though. I don't think it should be necessary to redeploy the OSDs, but the cleanup has to be proper. As a guidance you can check the cephadm tool's contents and look for the "adopt" function. That migrates the contents of the pre-cephadm daemons into the FSID specific directories.

Regards,
Eugen

[1] https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/K2R3MXRD3S2DSXCEGX5IPLCF5L3UUOQI/

Zitat von Dan O'Brien <dobrie2@xxxxxxx>:

OK... I've been in the Circle of Hell where systemd lives and I *THINK* I have convinced myself I'm OK. I *REALLY* don't want to trash and rebuild the OSDs.

In the manpage for systemd.unit, I found

UNIT GARBAGE COLLECTION
The system and service manager loads a unit's configuration automatically when a unit is referenced for the first time. It will automatically unload the unit configuration and state again when the unit is not needed anymore ("garbage collection").

I've disabled the systemd units (which removes the symlink from the target) for the non-cephadm OSDs I created by mistake and I'm PRETTY SURE if I wait long enough (or reboot) that I won't see them any more, since there won't be a unit for systemd to care about.

I *WILL* have to clean up /var/lib/ceph/osd eventually. I tried just now, but it says "device busy." I think that's because there's some OTHER systemd cruft that shows a mount:
[root@ceph02 ~]# systemctl --all | grep ceph | grep mount
var-lib-ceph-osd-ceph\x2d11.mount loaded active mounted /var/lib/ceph/osd/ceph-11 var-lib-ceph-osd-ceph\x2d25.mount loaded active mounted /var/lib/ceph/osd/ceph-25 var-lib-ceph-osd-ceph\x2d9.mount loaded active mounted /var/lib/ceph/osd/ceph-9

When things settle down, I *MIGHT* put in a RFE to change the default for ceph-volume to --no-systemd to save someone else from this anguish.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux