Re: Removed daemons listed as stray

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I had a situation like this, and the only operation that solved was a full
reboot of the cluster (it was due the a watchdog alarm), but when the
cluster return, the stray osds were gone.

On Fri, 28 Jan 2022, 19:32 Adam King, <adking@xxxxxxxxxx> wrote:

> Hello Vlad,
>
> Just some insight into how CEPHADM_STRAY_DAEMON works: This health warning
> is specifically designed to point out daemons in the cluster that cephadm
> is not aware of/in control of. It does this by comparing the daemons it has
> cached info on (this cached info is what you see in "ceph orch ps") with
> the return value of a core mgr function designed to list the servers in the
> cluster and what daemons are on them. This function, from cephadm's point
> of view, is a bit of a black box (by design, as it is meant  to find
> daemons cephadm is not aware of/in control of). If you'd like to see a
> rough estimate of what that looks like I'd check the output of "ceph node
> ls" (you may see your non-existent osds listed there). This means, a daemon
> that does not exist that cephadm is falsely reporting as a stray daemon
> cannot typically be resolved through "ceph orch . . ." commands. In the
> past I've found sometimes just doing a mgr failover ("ceph mgr fail") will
> clear this in the case of false reports so that's what I'd try first. If
> that doesn't, I'd maybe try checking if the osd is till listed in the crush
> map and if so, remove it (first step in
>
> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-the-osd
> I think). It's possible that the reason the daemon rm commands hung is one
> of the cleanup operations cephadm was trying to run under the hood when
> removing the osd hung and so the osd is still believed to be present by the
> cluster.
>
> - Adam
>
> On Fri, Jan 28, 2022 at 11:28 AM Vladimir Brik <
> vladimir.brik@xxxxxxxxxxxxxxxx> wrote:
>
> > Hello
> >
> > I needed to permanently remove two drives from my pool so I
> > ran "ceph orch daemon rm XXX". The command hung for both
> > OSDs, but the daemons were removed. I then purged the two OSDs.
> >
> > Now ceph status is complaining about them with
> > CEPHADM_STRAY_DAEMON, but the daemons aren't running and are
> > not showing up in ceph orch ps. If I try to "daemon rm"
> > again I get Error EINVAL: Unable to find daemon(s).
> >
> > Anybody have an idea about what could have happened or how
> > to stop ceph status from listing the non-existing daemons as
> > stray?
> >
> >
> > Thanks,
> >
> > Vlad
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux