Re: Removed daemons listed as stray

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks very much! "ceph mgr fail" did the trick.

It's weird I thought I rebooted both managers since the problem occurred, but maybe they rebooted too quickly and no failover actually happened.

Vlad


On 1/28/22 13:31, Adam King wrote:
Hello Vlad,

Just some insight into how CEPHADM_STRAY_DAEMON works: This health warning is specifically designed to point out daemons in the cluster that cephadm is not aware of/in control of. It does this by comparing the daemons it has cached info on (this cached info is what you see in "ceph orch ps") with the return value of a core mgr function designed to list the servers in the cluster and what daemons are on them. This function, from cephadm's point of view, is a bit of a black box (by design, as it is meant  to find daemons cephadm is not aware of/in control of). If you'd like to see a rough estimate of what that looks like I'd check the output of "ceph node ls" (you may see your non-existent osds listed there). This means, a daemon that does not exist that cephadm is falsely reporting as a stray daemon cannot typically be resolved through "ceph orch . . ." commands. In the past I've found sometimes just doing a mgr failover ("ceph mgr fail") will clear this in the case of false reports so that's what I'd try first. If that doesn't, I'd maybe try checking if the osd is till listed in the crush map and if so, remove it (first step in https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-the-osd <https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-the-osd> I think). It's possible that the reason the daemon rm commands hung is one of the cleanup operations cephadm was trying to run under the hood when removing the osd hung and so the osd is still believed to be present by the cluster.

- Adam

On Fri, Jan 28, 2022 at 11:28 AM Vladimir Brik <vladimir.brik@xxxxxxxxxxxxxxxx <mailto:vladimir.brik@xxxxxxxxxxxxxxxx>> wrote:

    Hello

    I needed to permanently remove two drives from my pool so I
    ran "ceph orch daemon rm XXX". The command hung for both
    OSDs, but the daemons were removed. I then purged the
    two OSDs.

    Now ceph status is complaining about them with
    CEPHADM_STRAY_DAEMON, but the daemons aren't running and
    are
    not showing up in ceph orch ps. If I try to "daemon rm"
    again I get Error EINVAL: Unable to find daemon(s).

    Anybody have an idea about what could have happened or how
    to stop ceph status from listing the non-existing
    daemons as
    stray?


    Thanks,

    Vlad
    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    <mailto:ceph-users@xxxxxxx>
    To unsubscribe send an email to ceph-users-leave@xxxxxxx
    <mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux