Thanks very much! "ceph mgr fail" did the trick.
It's weird I thought I rebooted both managers since the
problem occurred, but maybe they rebooted too quickly and no
failover actually happened.
Vlad
On 1/28/22 13:31, Adam King wrote:
Hello Vlad,
Just some insight into how CEPHADM_STRAY_DAEMON works: This
health warning is specifically designed to point out daemons
in the cluster that cephadm is not aware of/in control of.
It does this by comparing the daemons it has cached info on
(this cached info is what you see in "ceph orch ps") with
the return value of a core mgr function designed to list the
servers in the cluster and what daemons are on them. This
function, from cephadm's point of view, is a bit of a black
box (by design, as it is meant to find daemons cephadm is
not aware of/in control of). If you'd like to see a rough
estimate of what that looks like I'd check the output of
"ceph node ls" (you may see your non-existent osds listed
there). This means, a daemon that does not exist that
cephadm is falsely reporting as a stray daemon cannot
typically be resolved through "ceph orch . . ." commands. In
the past I've found sometimes just doing a mgr failover
("ceph mgr fail") will clear this in the case of false
reports so that's what I'd try first. If that doesn't, I'd
maybe try checking if the osd is till listed in the crush
map and if so, remove it (first step in
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-the-osd
<https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-the-osd>
I think). It's possible that the reason the daemon rm
commands hung is one of the cleanup operations cephadm was
trying to run under the hood when removing the osd hung and
so the osd is still believed to be present by the cluster.
- Adam
On Fri, Jan 28, 2022 at 11:28 AM Vladimir Brik
<vladimir.brik@xxxxxxxxxxxxxxxx
<mailto:vladimir.brik@xxxxxxxxxxxxxxxx>> wrote:
Hello
I needed to permanently remove two drives from my pool so I
ran "ceph orch daemon rm XXX". The command hung for both
OSDs, but the daemons were removed. I then purged the
two OSDs.
Now ceph status is complaining about them with
CEPHADM_STRAY_DAEMON, but the daemons aren't running and
are
not showing up in ceph orch ps. If I try to "daemon rm"
again I get Error EINVAL: Unable to find daemon(s).
Anybody have an idea about what could have happened or how
to stop ceph status from listing the non-existing
daemons as
stray?
Thanks,
Vlad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx