I think I can count 5 sources that Ceph can query to report/display/control its resources. 1. The /dec/ceph/ceph.conf file. Mostly supplanted bt the Ceph configuration database. 2. The ceph configuration database. A namelesskey/value store internal to a ceph filesystem. It's distributed (no fixed location), accessed by Ceph commands and APIs/ 3. Legacy Ceph resources/ Stuff found under a host's /var/lib/ceph directory. 4. Managed Ceph resources. Stuff found under a host's /var/lib/ceph/{fsid} diirectory. 5. The live machine state of Ceph. Since this not only can vary from host to host, but also service to service, I don't think that this is considered to be an authoritative source of information. Compounding this is that current releases of Ceph can all too easily end up in a "forbidden" state where you may have, for example a legacy OSD.6 and a managed OSD.6 on the same host. In such a case, system is generally operable, but functionally corrupt and ideally should be corrected to remove the redundant resource. The real issue is that depending on what Ceph interface you're querying (or "ceph health" is querying!), you don't always get your answer from a single authoritative source, so you'll get conflicting results and annoying error reports. The "stray daemon" condition is an especially egregious example of this, and it's not only possible because of a false detection from one of the above sources, but also, I think can come from "dead" daemons being referenced in CRUSH. You might want to run through this lists's history for "phantom host" postings made by me back around this past June because I was absolutely plagued with them. Eugen Block helped me eventually purge them all. Regards, Tim On Sat, 2024-11-16 at 21:42 +0100, Jakub Daniel wrote: > Hello, > > I'm pretty new to ceph deployment. I have setup my first cephfs > cluster > using cephadm. Initially, I deployed ceph in 3 virtualbox instances > that I > called cephfs-cluster-node-{0, 1, 2} just to test things. Later, I > added 5 > more real hardware nodes. Later I decided I'd remove the > virtualboxes, so I > drained the osds and removed the hosts. Suddenly, ceph status detail > started reporting > > HEALTH_WARN 1 stray host(s) with 1 daemon(s) not managed by cephadm > [WRN] > CEPHADM_STRAY_HOST: 1 stray host(s) with 1 daemon(s) not managed by > cephadm > stray host cephfs-cluster-node-2 has 1 stray daemons: ['mon.X'] > > The cephfs-cluster-node-2 is no longer listed among hosts, it is (and > has > been for tens of hours) offline (powered down). The mon.X doesn't > even > belong to that node, it is one of the real hardware nodes. I am > unaware of > mon.X ever running on cephfs-cluster-node-2 (never noticed it among > systemd > units). > > Where does cephadm shell -- ceph status detail come to the conclusion > there > is something stray? How can I address this? > > Thank you for any insights > Jakub > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx