Re: CEPHADM_STRAY_DAEMON does not exist, how do I remove knowledge of it from ceph?

Michael Baer <ceph@xxxxxxxxxxxxxxx> · Wed, 01 Feb 2023 15:06:46 -0800

Thanks Adam,

'ceph mgr fail' didn't end up working for me, but it did lead down the
path to getting it working. It looks like one of the managers was borked
somehow. Although it wasn't the manager that looked to have a stray
host, it was the other one. And there also seems to be an issue with
running cephadm shell on a machine when doing drain/maintenance on the
same machine.  By not running 'cephadm shell' on the managers when
draining/undraining them (and the second manager needed an explicit
'orch daemon rm --force'), it has gotten rid of the Daemon ghosts in the
machine.

(at least for now :) ).

-Mike

>>>>> On Wed, 1 Feb 2023 16:00:18 -0500, Adam King <adking@xxxxxxxxxx> said:

    AK> I know there's a bug where when downsizing by multiple mons at once through
    AK> cephadm this ghost stray mon daemon thing can end up happening (I think
    AK> something about cephadm removing them too quickly in succession, not
    AK> totally sure). In those cases, just doing a mgr failover ("ceph mgr fail")
    AK> always cleared the warnings after a couple minutes. That might be worth a
    AK> try if you haven't done so already and you have at least two mgr daemons in
    AK> the cluster.

    AK> On Wed, Feb 1, 2023 at 3:56 PM <ceph@xxxxxxxxxxxxxxx> wrote:

    >> Hi All,
    >> 
    >> I'm getting this error while setting up a ceph cluster. I'm relatively new
    >> to ceph, so there is no telling what kind of mistakes I've been making. I'm
    >> using cephadm, ceph v16 and I apparently have a stray daemon. But it also
    >> doesn't seem to exist and I can't get ceph to forget about it.
    >> 
    >> $ ceph health detail
    >> [WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
    >> stray daemon mon.cmon01 on host cmgmt01 not managed by cephadm
    >> 
    >> mon.cmon01 also shows up in dashboard->hosts as running on cmgmt01. It
    >> does not show up in the monitors section though.
    >> 
    >> But, there isn't a monitor daemon running on that machine at all (no
    >> podman container, not in process list, not listening on a port).
    >> 
    >> On that host in cephadm shell,
    >> # ceph orch daemon rm mon.cmon01 --force
    >> Error EINVAL: Unable to find daemon(s) ['mon.cmon01']
    >> 
    >> I don't currently have any real data on the cluster, so I've also tried
    >> deleting the existing pools (except device_health_metrics) in case ceph was
    >> connecting that monitor to one of the pools.
    >> 
    >> I'm not sure what to try next in order to get ceph to forget about that
    >> daemon.
    >> _______________________________________________
    >> ceph-users mailing list -- ceph-users@xxxxxxx
    >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
    >> 
    >> 

-- 
Michael Baer
ceph@xxxxxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx