Thanks Adam, 'ceph mgr fail' didn't end up working for me, but it did lead down the path to getting it working. It looks like one of the managers was borked somehow. Although it wasn't the manager that looked to have a stray host, it was the other one. And there also seems to be an issue with running cephadm shell on a machine when doing drain/maintenance on the same machine. By not running 'cephadm shell' on the managers when draining/undraining them (and the second manager needed an explicit 'orch daemon rm --force'), it has gotten rid of the Daemon ghosts in the machine. (at least for now :) ). -Mike >>>>> On Wed, 1 Feb 2023 16:00:18 -0500, Adam King <adking@xxxxxxxxxx> said: AK> I know there's a bug where when downsizing by multiple mons at once through AK> cephadm this ghost stray mon daemon thing can end up happening (I think AK> something about cephadm removing them too quickly in succession, not AK> totally sure). In those cases, just doing a mgr failover ("ceph mgr fail") AK> always cleared the warnings after a couple minutes. That might be worth a AK> try if you haven't done so already and you have at least two mgr daemons in AK> the cluster. AK> On Wed, Feb 1, 2023 at 3:56 PM <ceph@xxxxxxxxxxxxxxx> wrote: >> Hi All, >> >> I'm getting this error while setting up a ceph cluster. I'm relatively new >> to ceph, so there is no telling what kind of mistakes I've been making. I'm >> using cephadm, ceph v16 and I apparently have a stray daemon. But it also >> doesn't seem to exist and I can't get ceph to forget about it. >> >> $ ceph health detail >> [WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm >> stray daemon mon.cmon01 on host cmgmt01 not managed by cephadm >> >> mon.cmon01 also shows up in dashboard->hosts as running on cmgmt01. It >> does not show up in the monitors section though. >> >> But, there isn't a monitor daemon running on that machine at all (no >> podman container, not in process list, not listening on a port). >> >> On that host in cephadm shell, >> # ceph orch daemon rm mon.cmon01 --force >> Error EINVAL: Unable to find daemon(s) ['mon.cmon01'] >> >> I don't currently have any real data on the cluster, so I've also tried >> deleting the existing pools (except device_health_metrics) in case ceph was >> connecting that monitor to one of the pools. >> >> I'm not sure what to try next in order to get ceph to forget about that >> daemon. >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> -- Michael Baer ceph@xxxxxxxxxxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx