Re: Removed daemons listed as stray

Vladimir Brik <vladimir.brik@xxxxxxxxxxxxxxxx> · Fri, 28 Jan 2022 15:00:04 -0600

Thanks very much! "ceph mgr fail" did the trick.

It's weird I thought I rebooted both managers since the 
problem occurred, but maybe they rebooted too quickly and no 
failover actually happened.

Vlad

On 1/28/22 13:31, Adam King wrote:
Hello Vlad,

Just some insight into how CEPHADM_STRAY_DAEMON works: This 
health warning is specifically designed to point out daemons 
in the cluster that cephadm is not aware of/in control of. 
It does this by comparing the daemons it has cached info on 
(this cached info is what you see in "ceph orch ps") with 
the return value of a core mgr function designed to list the 
servers in the cluster and what daemons are on them. This 
function, from cephadm's point of view, is a bit of a black 
box (by design, as it is meant  to find daemons cephadm is 
not aware of/in control of). If you'd like to see a rough 
estimate of what that looks like I'd check the output of 
"ceph node ls" (you may see your non-existent osds listed 
there). This means, a daemon that does not exist that 
cephadm is falsely reporting as a stray daemon cannot 
typically be resolved through "ceph orch . . ." commands. In 
the past I've found sometimes just doing a mgr failover 
("ceph mgr fail") will clear this in the case of false 
reports so that's what I'd try first. If that doesn't, I'd 
maybe try checking if the osd is till listed in the crush 
map and if so, remove it (first step in 
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-the-osd 
<https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-the-osd> 
I think). It's possible that the reason the daemon rm 
commands hung is one of the cleanup operations cephadm was 
trying to run under the hood when removing the osd hung and 
so the osd is still believed to be present by the cluster.

- Adam

On Fri, Jan 28, 2022 at 11:28 AM Vladimir Brik 
<vladimir.brik@xxxxxxxxxxxxxxxx 
<mailto:vladimir.brik@xxxxxxxxxxxxxxxx>> wrote:

    Hello

    I needed to permanently remove two drives from my pool so I
    ran "ceph orch daemon rm XXX". The command hung for both
    OSDs, but the daemons were removed. I then purged the
    two OSDs.

    Now ceph status is complaining about them with
    CEPHADM_STRAY_DAEMON, but the daemons aren't running and
    are
    not showing up in ceph orch ps. If I try to "daemon rm"
    again I get Error EINVAL: Unable to find daemon(s).

    Anybody have an idea about what could have happened or how
    to stop ceph status from listing the non-existing
    daemons as
    stray?

    Thanks,

    Vlad
    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    <mailto:ceph-users@xxxxxxx>
    To unsubscribe send an email to ceph-users-leave@xxxxxxx
    <mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx