Wanted to respond to the original thread I saw archived on this topic but I wasn't subscribed to the mailing list yet so don't have the thread in my inbox to reply to. Hopefully, those involved in that thread still see this. This issue looks the same as https://tracker.ceph.com/issues/51027 which is being worked on. Essentially, it seems that hosts that were being rebooted were temporarily marked as offline and cephamd had an issue where it would try to remove all daemons (outside of osds I believe) from offline hosts. The pre-remove step for monitors was to remove it from the monmap, so this would happen, but then the daemon itself would not be removed since the host was temporarily inaccessible due to the reboot. When the host came back up, the mon was restarted but it had already been removed from the monmap so it gets stuck in a "stopped" state. A fix for this that stops cephadm from trying to remove daemons from offline hosts is in the works. A temporary workaround right now, as mentioned by Harry on that tracker, is to get cephadm to actually remove the mon daemon by changing the placement spec to not include the host with the broken mon. Then wait to see the mon daemon was removed, and finally put the placement spec back to how it was so the mon gets redeployed (and now hopefully runs normally). _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx