Thanks. I did resolve that problem, though I haven't had a chance to update until now. I had already attempted to use ceph orch to remove the daemons, but they didn't succeed. Fortunately, I was able to bring the host online, which allowed the scheduled removals to complete. I confirmed everything was drained, again removed the host from inventory and powered down. Still got complaints from cephadm about the decommissioned host. I took a break - impatience and ceph don't mix - and came back to address the next problem. which was lots of stuck PGs. Either because cephadm timed out or something kicked in when I started randomly rebooting OSDs. the host complaint finally disappeared. End of story. Now for what sent me down that path. I had 2 OSDs on one server and felt that that was probably not a good idea, so I marked one for deletion. 4 days later it was still in "destroying" state. More concerning, all signs indicated that despite having been reweighted to 0, the "destroying" OSD was still an essential participant and no indication that its PGs were being relocared to active servers. Shutting down the "destroying" OSD would immediately trigger a re-allocation panic, but that didn't clean anything. The re-allocation would proceed at a furious pace, then slowly stall out and hang, and the system was degraded. Restarting the OSD brought the PG inventory back up, but stuff still wasn't moving off the OSD, Right about that time I decommissioned the questionable host. Finally, I did a "ceph orch rm osd.x", and terminated the "destroying" permanently, making it finally disappear from the OSD tree list. I also deleted a number of OSD pools that are (hopefully) not going to be missed. Kicking and randomly repeatedly rebooting the other OSDs finally cleared all the stuck OSDs, some of which hadn't resolved in over 2 days. So at the moment, it's either rebalancing the cleaned-up OSDs or in a loop thinking that it is. And the PG/per-OSD count seems way too high, but the auto-sized doesn't seem to want to do anything about that. Of course, the whole shebang has been unavailable to clients this whole week because of that. I've been considering upgrading to reef, but recent posts regarding issues resembling what I've been going through are making me pause. Again, thanks! Tim On Wed, 2025-02-26 at 13:57 +0100, Frédéric Nass wrote: > Hi Tim, > > If you can't bring the host back online so that cephadm can remove > these services itself, I guess you'll have to clean up the mess by: > > - removing these services from the cluster (for example with a 'ceph > mon remove {mon-id}' for the monitor) > - forcing their removal from the orchestrator with the --force option > on the commands 'ceph orch daemon rm <names>' and 'ceph orch host rm > <hostname>'. If the --force option doesn't help, then looking > into/editing/removing ceph-config keys like 'mgr/cephadm/inventory' > and 'mgr/cephadm/host.ceph07.internal.mousetech.com' that 'ceph > config-key dump' output shows might help. > > Regards, > Frédéric. > > ----- Le 25 Fév 25, à 16:42, Tim Holloway timh@xxxxxxxxxxxxx a écrit > : > > > Ack. Another fine mess. > > > > I was trying to clean things up and the process of tossing around > > OSD's > > kept getting me reports of slow responses and hanging PG > > operations. > > > > This is Ceph Pacific, by the way. > > > > I found a deprecated server that claimed to have an OSD even though > > it > > didn't show in either "ceph osd tree" or the dashboard OSD list. I > > suspect that a lot of the grief came from it attempting to use > > resources that weren't always seen as resources. > > > > I shut down the server's OSD (removed the daemon using ceph orch), > > then > > foolishly deleted the server from the inventory without doing a > > drain > > first. > > > > Now cephadmin hates me (key not found), and there are still an MDS > > and > > MON listed as ceph orch ls daemons even after I powered the host > > off. > > > > I cannot do a ceph orch daemon delete because there's no longer an > > IP > > address available to the daemon delete, and I cannot clear the > > cephadmin queue: > > > > [ERR] MGR_MODULE_ERROR: Module 'cephadm' has failed: > > 'ceph07.internal.mousetech.com' > > > > Any suggestions? > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx