Hi Eugen, It's gone now, although similar artefacts seems to linger. The reason it's gone is that I've been upgrading all my machines to AlamLinux 8 from CentOS 7 and AlmaLinux 7, as one is already EOL and the other is within days of it. Rather than upgrade-in-place, I chose to nuke/replace the entire system disks and provision from scratch. It helped me clean up my network and get rid of years of cruft. Ceph helped a lot there, since I'd do one machine at a time, and since the provisioning data is on Ceph, it was always available even as individual machines went up and down. I lost the phantom host, although for a while one of the newer OSDs gave me issues. The container would die while starting claiming that the OSD block (badly-quoted) was "already in user". I believe this happened right after I moved the _admin node to that machine. I finally got the failed machine back online by manually stopping the systemd service, waiting a while, then starting (not restarting) it. But some other nodes may have been rebooted in the interim, so it's hard to be certain what actually made it happy. Annoyingly, the dashboard and OSD tree listed the failed node as "up" and "in" even thoiug "ceph orch ps" showed it as "error". I couldn't persuade it to go down and out, or I would have destroyed and re-created it. I did clear up a major mess, though. My original install/admin machine was littered with dead and mangled objects. Two long-deleted OSDs left traces, and there was a mix of pre-cephadm components (including) OSDs and newer stuff. I did discover a node still running Octopus which I plan to upgrade today, but overall things look pretty clean, excepting the ever- frustrating "too many PGs per OSD". If autotuning was supposed to auto- fix this, it's not doing so, even though autotuning is switched on. Manual changes don't seem to take either. Going back to the phantom host situation, one thing I have seen is that on the dashboard, the hosts display lists OSDs that have been deleted as belonging to that machine. "ceph osd tree" and the OSD view disagree and show neither the phantom host nor the deleted OSDs. Just to recap, the original phantom host was a non-ceph node that accidentally got sucked in when I did a host add with the wrong IP address. It then claimed to own another host's OSD. Thanks, Tim On Tue, 2024-07-09 at 06:08 +0000, Eugen Block wrote: > Hi Tim, > > is this still an issue? If it is, I recommend to add some more > details > so it's easier to follow your train of thought. > > ceph osd tree > ceph -s > ceph health detail > ceph orch host ls > > And then please point out which host you're trying to get rid of. I > would deal with the rgw thing later. Is it possible, that the > phantom > host actually had OSDs on it? Maybe that needs to be cleaned up > first. > I had something similar on a customer cluster recently where we > hunted > failing OSDs but it turned out they were removed quite a while ago, > just not properly cleaned up yet on the filesystem. > > Thanks, > Eugen > > Zitat von Tim Holloway <timholloway34@xxxxxxxxx>: > > > It's getting worse. > > > > As many may be aware, the venerable CentOS 7 OS is hitting end-of- > > life in a > > matter of days. > > > > The easiest way to upgrade my serves has been to simply create an > > alternate > > disk with the new OS, turn my provisioning system loose on it, yank > > the old > > OS system disk and jack in the new one. > > > > > > However, Ceph is another matter. For that part, the simplest thing > > to do is > > to destroy the Ceph node(s) on the affected box, do the OS upgrade, > > then > > re-create the nodes. > > > > But now I have even MORE strays. The OSD on my box lives on in Ceph > > in the > > dashboard host view even though the documented removal procedures > > were > > followed and the VM itself was destroyed. > > > > Further, this last node is an RGW node and I cannot remove it from > > the RGW > > configuration. It not only shows on the dashboard, it also lists as > > still > > active on the command line and as entries in the config database no > > matter > > what I do. > > > > > > I really need some solution to this, as it's a major chokepoint in > > the > > upgrade process > > > > > > Tim > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx