Hey Adam, thanks again for your help. I finally got around to execute your suggested procedure. It went mostly find. Except when I renamed the last host, I ended up with a rogue mon. I assume a new mon was created on a different host while the last one was "out" of cephadm. And the remaining mon on the last host is now not cleaned up by cephadm (maybe due to being legacy). I got the following warnings (relevant sections): [WRN] CEPHADM_APPLY_SPEC_FAIL: Failed to apply 1 service(s): osd.all-available-devices osd.all-available-devices: host osd-mirror-1 `cephadm ceph-volume` failed: cephadm exited with an error code: 1, stderr:Inferring config /var/lib/ceph/7efa00f9-182f-40f4-9136-d51895db1f0b/mon.osd-mirror-1/config ERROR: [Errno 2] No such file or directory: '/var/lib/ceph/7efa00f9-182f-40f4-9136-d51895db1f0b/mon.osd-mirror-1/config' [WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or devices host osd-mirror-1 `cephadm ceph-volume` failed: cephadm exited with an error code: 1, stderr:Inferring config /var/lib/ceph/7efa00f9-182f-40f4-9136-d51895db1f0b/mon.osd-mirror-1/config ERROR: [Errno 2] No such file or directory: '/var/lib/ceph/7efa00f9-182f-40f4-9136-d51895db1f0b/mon.osd-mirror-1/config' The mon is not running and required anymore. It's not listed via `systemctl`, `ceph orch ps` nor `ceph status` anymore: mon: 3 daemons, quorum osd-mirror-3,osd-mirror-2,osd-mirror-6 (age 3m) But cephadm is still aware of the late daemon and trying use (?) it. From `cephadm ls`: { "style": "legacy", "name": "mon.osd-mirror-1", "fsid": "7efa00f9-182f-40f4-9136-d51895db1f0b", "systemd_unit": "ceph-mon@osd-mirror-1", "enabled": false, "state": "unknown", "host_version": "15.2.14" }, Tried to remove it (which didn't help): cephadm rm-daemon --name mon.osd-mirror-1 --fsid 7efa00f9-182f-40f4-9136-d51895db1f0b --force --force-delete-data And then figured out cephadm is not supposed to remove legacy daemons: https://tracker.ceph.com/issues/45976 Also tried some manual removal without success: 0|0[root@osd-mirror-1 ~]# service ceph -a stop mon.osd-mirror-1 The service command supports only basic LSB actions (start, stop, restart, try-restart, reload, force-reload, status). For other actions, please try to use systemctl. 0|0[root@osd-mirror-1 ~]# ceph mon remove osd-mirror-1 mon.osd-mirror-1 does not exist or has already been removed What other options do I have to remove this daemon? I.e. the rest of information cephadm keeps, resulting in it thinking the mon would be available? Thanks again for all your help. Best, Mathias On 5/20/2022 5:16 PM, Adam King wrote: To clarify a bit, "ceph orch host rm <hostname> --force" won't actually touch any of the daemons on the host. It just stops cephadm from managing the host. I.e. it won't add/remove daemons on the host. If you remove the host then re-add it with the new host name nothing should actually happen to the daemons there. The only possible exception is if you have services whose placement uses count and one of the daemons from that service is on the host being temporarily removed. It's possible it could try to deploy that daemon on another host in the interim. However, OSDs are never like that so there would never be any need for flags like no-out or no-backfill. The worst case would be it moving a mon or mgr around. If you make sure all the important services are deployed by label, explicit hosts etc. (just not count) then there should be no risk of any daemons moving at all and this is a pretty safe operation. On Fri, May 20, 2022 at 3:36 AM Kuhring, Mathias <mathias.kuhring@xxxxxxxxxxxxxx<mailto:mathias.kuhring@xxxxxxxxxxxxxx>> wrote: Hey Adam, thanks for your fast reply. That's a bit more invasive and risky than I was hoping for. But if this is the only way, I guess we need to do this. Would it be advisable to put some maintenance flags like noout, nobackfill, norebalance? And maybe stop the ceph target on the host I'm re-adding to pause all daemons? Best, Mathias On 5/19/2022 8:14 PM, Adam King wrote: cephadm just takes the hostname given in the "ceph orch host add" commands and assumes it won't change. The FQDN names (or whatever "ceph orch host ls" shows in any scenario) are from whatever input was given in those commands. Cephadm will even try to verify the hostname matches what is given when adding the host. As for where it is stored, we keep that info in the mon key store and it isn't meant to be manually updated (ceph config-key get mgr/cephadm/inventory). Although, there have occasionally been people running into issues related to a mismatch between an FQDN and a shortname. There's no built-in command for changing a hostname because of the expectation that it won't change. However, you should be able to fix this by removing and re-adding the host. E.g. "ceph orch host rm osd-mirror-1.our.domain.org<http://osd-mirror-1.our.domain.org/>" followed by "ceph orch host add osd-mirror-1 172.16.62.22 --labels rgw --labels osd". If you're on a late enough version that it requests you drain the host before we'll remove it (it was some pacific dot release, don't remember which one) you can pass --force to the host rm command. Generally it's not a good idea to remove hosts from cephadm's control while there are still cephadm deployed daemons on it like that but this is a special case. Anyway, removing and re-adding the host is the only (reasonable) way to change what it has stored for the hostname that I can remember. Let me know if that doesn't work, - Adam King On Thu, May 19, 2022 at 1:41 PM Kuhring, Mathias <mathias.kuhring@xxxxxxxxxxxxxx<mailto:mathias.kuhring@xxxxxxxxxxxxxx>> wrote: Dear ceph users, one of our cluster is complaining about plenty of stray hosts and daemons. Pretty much all of them. [WRN] CEPHADM_STRAY_HOST: 6 stray host(s) with 280 daemon(s) not managed by cephadm stray host osd-mirror-1 has 47 stray daemons: ['mgr.osd-mirror-1.ltmyyh', 'mon.osd-mirror-1', 'osd.1', ...] stray host osd-mirror-2 has 46 stray daemons: ['mon.osd-mirror-2', 'osd.0', ...] stray host osd-mirror-3 has 48 stray daemons: ['cephfs-mirror.osd-mirror-3.qzcuvv', 'mgr.osd-mirror-3', 'mon.osd-mirror-3', 'osd.101', ...] stray host osd-mirror-4 has 47 stray daemons: ['mds.cephfs.osd-mirror-4.omjlxu', 'mgr.osd-mirror-4', 'osd.103', ...] stray host osd-mirror-5 has 46 stray daemons: ['mgr.osd-mirror-5', 'osd.139', ...] stray host osd-mirror-6 has 46 stray daemons: ['mds.cephfs.osd-mirror-6.hobjsy', 'osd.141', ...] It all seems to boil down to host names from `ceph orch host ls` not matching with other configurations. ceph orch host ls HOST ADDR LABELS STATUS osd-mirror-1.our.domain.org<http://osd-mirror-1.our.domain.org> 172.16.62.22 rgw osd osd-mirror-2.our.domain.org<http://osd-mirror-2.our.domain.org> 172.16.62.23 rgw osd osd-mirror-3.our.domain.org<http://osd-mirror-3.our.domain.org> 172.16.62.24 rgw osd osd-mirror-4.our.domain.org<http://osd-mirror-4.our.domain.org> 172.16.62.25 rgw mds osd osd-mirror-5.our.domain.org<http://osd-mirror-5.our.domain.org> 172.16.62.32 rgw osd osd-mirror-6.our.domain.org<http://osd-mirror-6.our.domain.org> 172.16.62.33 rgw mds osd hostname osd-mirror-6 hostname -f osd-mirror-6.our.domain.org<http://osd-mirror-6.our.domain.org> 0|0[root@osd-mirror-6 ~]# ceph mon metadata | grep "\"hostname\"" "hostname": "osd-mirror-1", "hostname": "osd-mirror-3", "hostname": "osd-mirror-2", 0|1[root@osd-mirror-6 ~]# ceph mgr metadata | grep "\"hostname\"" "hostname": "osd-mirror-1", "hostname": "osd-mirror-3", "hostname": "osd-mirror-4", "hostname": "osd-mirror-5", The documentation states, that "cephadm demands that the name of host given via `ceph orch host add` equals the output of `hostname` on remote hosts.". https://docs.ceph.com/en/latest/cephadm/host-management/#fully-qualified-domain-names-vs-bare-host-names https://docs.ceph.com/en/octopus/cephadm/concepts/?#fully-qualified-domain-names-vs-bare-host-names But it seems our cluster wasn't setup like this. How can I now change the host names which were assigend when adding the hosts with `ceph orch host add HOSTNAME`? I can't seem to find any documentation on changing the host names which are listed by `ceph orch host ls`. All I can find is related to changing the actual name of the host in the system. The crush map also just contains the bare host names. So, where are these FQDN names actually registered? Thank you for help. Best regards, Mathias _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> -- Mathias Kuhring Dr. rer. nat. Bioinformatician HPC & Core Unit Bioinformatics Berlin Institute of Health at Charité (BIH) E-Mail: mathias.kuhring@xxxxxxxxxxxxxx<mailto:mathias.kuhring@xxxxxxxxxxxxxx> Mobile: +49 172 3475576 -- Mathias Kuhring Dr. rer. nat. Bioinformatician HPC & Core Unit Bioinformatics Berlin Institute of Health at Charité (BIH) E-Mail: mathias.kuhring@xxxxxxxxxxxxxx<mailto:mathias.kuhring@xxxxxxxxxxxxxx> Mobile: +49 172 3475576 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx