I have found these two threads https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/VHZ7IJ7PAL7L2INLSHNVYY7V7ZCXD46G/#TSWERUMAEEGZPSYXG6PSS4YMRXPP3L63 https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NG5QVRTVCLLYNLK56CSYLIPE4WBFXS5U/#HJDBAJFX27KATC4WV2MKGLVGLN2HTWWD but I didn't exactly figure out what to do. I've since found remnants of removed cephfs-cluster-node-{0,1,2} in crush map buckets. Which I removed with no effect on health detail. I found out that ceph dashboard lists the non-existent cephfs-cluster-node-2 among ceph hosts (while orch host ls doesn't). On the other hand, device ls-by-host cephfs-cluster-node-2 lists an entry with device name coinciding with a drive on the live X host, with daemons listing the mon.X. Meanwhile device ls-by-host X lists among other things the exact same entry with the difference that in the dev column there is the actual device nvme0n1 whereas with cephfs-cluster-node-2 the column was empty root@X:~# cephadm shell -- ceph device ls-by-host cephfs-cluster-node-2 DEVICE DEV DAEMONS EXPECTED FAILURE Samsung_SSD_970_PRO_1TB_S462NF0M310269L mon.X root@X:~# cephadm shell -- ceph device ls-by-host X DEVICE DEV DAEMONS EXPECTED FAILURE Samsung_SSD_850_EVO_250GB_S2R6NX0J423123P sdb osd.3 Samsung_SSD_860_EVO_1TB_S3Z9NY0M431048H mon.Y Samsung_SSD_970_PRO_1TB_S462NF0M310269L nvme0n1 mon.X This output is confusing since it lists mon.Y also when asked for host X. I will continue investigating. If anyone has any hints what to try or where to look, I would be very grateful. Jakub On Sun, 17 Nov 2024, 15:34 Tim Holloway, <timh@xxxxxxxxxxxxx> wrote: > I think I can count 5 sources that Ceph can query to > report/display/control its resources. > > 1. The /dec/ceph/ceph.conf file. Mostly supplanted bt the Ceph > configuration database. > > 2. The ceph configuration database. A namelesskey/value store internal > to a ceph filesystem. It's distributed (no fixed location), accessed by > Ceph commands and APIs/ > > 3. Legacy Ceph resources/ Stuff found under a host's /var/lib/ceph > directory. > > 4. Managed Ceph resources. Stuff found under a host's > /var/lib/ceph/{fsid} diirectory. > > 5. The live machine state of Ceph. Since this not only can vary from > host to host, but also service to service, I don't think that this is > considered to be an authoritative source of information. > > Compounding this is that current releases of Ceph can all too easily > end up in a "forbidden" state where you may have, for example a legacy > OSD.6 and a managed OSD.6 on the same host. In such a case, system is > generally operable, but functionally corrupt and ideally should be > corrected to remove the redundant resource. > > The real issue is that depending on what Ceph interface you're querying > (or "ceph health" is querying!), you don't always get your answer from > a single authoritative source, so you'll get conflicting results and > annoying error reports. The "stray daemon" condition is an especially > egregious example of this, and it's not only possible because of a > false detection from one of the above sources, but also, I think can > come from "dead" daemons being referenced in CRUSH. > > You might want to run through this lists's history for "phantom host" > postings made by me back around this past June because I was absolutely > plagued with them. Eugen Block helped me eventually purge them all. > > Regards, > Tim > > On Sat, 2024-11-16 at 21:42 +0100, Jakub Daniel wrote: > > Hello, > > > > I'm pretty new to ceph deployment. I have setup my first cephfs > > cluster > > using cephadm. Initially, I deployed ceph in 3 virtualbox instances > > that I > > called cephfs-cluster-node-{0, 1, 2} just to test things. Later, I > > added 5 > > more real hardware nodes. Later I decided I'd remove the > > virtualboxes, so I > > drained the osds and removed the hosts. Suddenly, ceph status detail > > started reporting > > > > HEALTH_WARN 1 stray host(s) with 1 daemon(s) not managed by cephadm > > [WRN] > > CEPHADM_STRAY_HOST: 1 stray host(s) with 1 daemon(s) not managed by > > cephadm > > stray host cephfs-cluster-node-2 has 1 stray daemons: ['mon.X'] > > > > The cephfs-cluster-node-2 is no longer listed among hosts, it is (and > > has > > been for tens of hours) offline (powered down). The mon.X doesn't > > even > > belong to that node, it is one of the real hardware nodes. I am > > unaware of > > mon.X ever running on cephfs-cluster-node-2 (never noticed it among > > systemd > > units). > > > > Where does cephadm shell -- ceph status detail come to the conclusion > > there > > is something stray? How can I address this? > > > > Thank you for any insights > > Jakub > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx