Cephadm actually builds the list of daemons on the host by looking at subdirectories in /var/lib/ceph/. "cephadm:v1" type daemons correspond to directories within /var/lib/ceph/<fsid> while "legacy" daemons correspond to directories of format /var/lib/ceph/<daemon-type>-<daemon-id> where <daemon-type> is one of "mon", "osd", "mds", "mgr". So, in this case, I'm guessing that host has a directory like "/var/lib/ceph/mon-osd-mirror-1". To "remove" the daemon, you should just have to remove the directory. Additionally, I will add that the inferring config issue itself was tracked in https://tracker.ceph.com/issues/54571 and should be resolved as of 16.2.9 and 17.2.1. so hopefully removing these legacy daemon dirs won't be necessary in the future. Thanks, - Adam King On Thu, Jun 23, 2022 at 6:42 AM Kuhring, Mathias < mathias.kuhring@xxxxxxxxxxxxxx> wrote: > Hey Adam, > > thanks again for your help. > > I finally got around to execute your suggested procedure. It went mostly > find. > Except when I renamed the last host, I ended up with a rogue mon. > I assume a new mon was created on a different host while the last one was > "out" of cephadm. > And the remaining mon on the last host is now not cleaned up by cephadm > (maybe due to being legacy). > > I got the following warnings (relevant sections): > [WRN] CEPHADM_APPLY_SPEC_FAIL: Failed to apply 1 service(s): > osd.all-available-devices > osd.all-available-devices: host osd-mirror-1 `cephadm ceph-volume` > failed: cephadm exited with an error code: 1, stderr:Inferring config > /var/lib/ceph/7efa00f9-182f-40f4-9136-d51895db1f0b/mon.osd-mirror-1/config > ERROR: [Errno 2] No such file or directory: > '/var/lib/ceph/7efa00f9-182f-40f4-9136-d51895db1f0b/mon.osd-mirror-1/config' > [WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or devices > host osd-mirror-1 `cephadm ceph-volume` failed: cephadm exited with an > error code: 1, stderr:Inferring config > /var/lib/ceph/7efa00f9-182f-40f4-9136-d51895db1f0b/mon.osd-mirror-1/config > ERROR: [Errno 2] No such file or directory: > '/var/lib/ceph/7efa00f9-182f-40f4-9136-d51895db1f0b/mon.osd-mirror-1/config' > > The mon is not running and required anymore. It's not listed via > `systemctl`, `ceph orch ps` nor `ceph status` anymore: > mon: 3 daemons, quorum osd-mirror-3,osd-mirror-2,osd-mirror-6 > (age 3m) > > But cephadm is still aware of the late daemon and trying use (?) it. From > `cephadm ls`: > { > "style": "legacy", > "name": "mon.osd-mirror-1", > "fsid": "7efa00f9-182f-40f4-9136-d51895db1f0b", > "systemd_unit": "ceph-mon@osd-mirror-1", > "enabled": false, > "state": "unknown", > "host_version": "15.2.14" > }, > > Tried to remove it (which didn't help): > cephadm rm-daemon --name mon.osd-mirror-1 --fsid > 7efa00f9-182f-40f4-9136-d51895db1f0b --force --force-delete-data > > And then figured out cephadm is not supposed to remove legacy daemons: > https://tracker.ceph.com/issues/45976 > > Also tried some manual removal without success: > 0|0[root@osd-mirror-1 ~]# service ceph -a stop mon.osd-mirror-1 > The service command supports only basic LSB actions (start, stop, restart, > try-restart, reload, force-reload, status). For other actions, please try > to use systemctl. > 0|0[root@osd-mirror-1 ~]# ceph mon remove osd-mirror-1 > mon.osd-mirror-1 does not exist or has already been removed > > What other options do I have to remove this daemon? > I.e. the rest of information cephadm keeps, resulting in it thinking the > mon would be available? > > Thanks again for all your help. > > Best, Mathias > On 5/20/2022 5:16 PM, Adam King wrote: > > To clarify a bit, "ceph orch host rm <hostname> --force" won't actually > touch any of the daemons on the host. It just stops cephadm from managing > the host. I.e. it won't add/remove daemons on the host. If you remove the > host then re-add it with the new host name nothing should actually happen > to the daemons there. The only possible exception is if you have services > whose placement uses count and one of the daemons from that service is on > the host being temporarily removed. It's possible it could try to deploy > that daemon on another host in the interim. However, OSDs are never like > that so there would never be any need for flags like no-out or no-backfill. > The worst case would be it moving a mon or mgr around. If you make sure all > the important services are deployed by label, explicit hosts etc. (just not > count) then there should be no risk of any daemons moving at all and this > is a pretty safe operation. > > On Fri, May 20, 2022 at 3:36 AM Kuhring, Mathias < > mathias.kuhring@xxxxxxxxxxxxxx> wrote: > >> Hey Adam, >> >> thanks for your fast reply. >> >> That's a bit more invasive and risky than I was hoping for. >> But if this is the only way, I guess we need to do this. >> >> Would it be advisable to put some maintenance flags like noout, >> nobackfill, norebalance? >> And maybe stop the ceph target on the host I'm re-adding to pause all >> daemons? >> >> Best, Mathias >> On 5/19/2022 8:14 PM, Adam King wrote: >> >> cephadm just takes the hostname given in the "ceph orch host add" >> commands and assumes it won't change. The FQDN names (or whatever "ceph >> orch host ls" shows in any scenario) are from whatever input was given in >> those commands. Cephadm will even try to verify the hostname matches what >> is given when adding the host. As for where it is stored, we keep that info >> in the mon key store and it isn't meant to be manually updated (ceph >> config-key get mgr/cephadm/inventory). Although, there have occasionally >> been people running into issues related to a mismatch between an FQDN and a >> shortname. There's no built-in command for changing a hostname because of >> the expectation that it won't change. However, you should be able to fix >> this by removing and re-adding the host. E.g. "ceph orch host rm >> osd-mirror-1.our.domain.org" followed by "ceph orch host add >> osd-mirror-1 172.16.62.22 --labels rgw --labels osd". If you're on a late >> enough version that it requests you drain the host before we'll remove it >> (it was some pacific dot release, don't remember which one) you can pass >> --force to the host rm command. Generally it's not a good idea to remove >> hosts from cephadm's control while there are still cephadm deployed daemons >> on it like that but this is a special case. Anyway, removing and re-adding >> the host is the only (reasonable) way to change what it has stored for the >> hostname that I can remember. >> >> Let me know if that doesn't work, >> - Adam King >> >> On Thu, May 19, 2022 at 1:41 PM Kuhring, Mathias < >> mathias.kuhring@xxxxxxxxxxxxxx> wrote: >> >>> Dear ceph users, >>> >>> one of our cluster is complaining about plenty of stray hosts and >>> daemons. Pretty much all of them. >>> >>> [WRN] CEPHADM_STRAY_HOST: 6 stray host(s) with 280 daemon(s) not managed >>> by cephadm >>> stray host osd-mirror-1 has 47 stray daemons: >>> ['mgr.osd-mirror-1.ltmyyh', 'mon.osd-mirror-1', 'osd.1', ...] >>> stray host osd-mirror-2 has 46 stray daemons: ['mon.osd-mirror-2', >>> 'osd.0', ...] >>> stray host osd-mirror-3 has 48 stray daemons: >>> ['cephfs-mirror.osd-mirror-3.qzcuvv', 'mgr.osd-mirror-3', >>> 'mon.osd-mirror-3', 'osd.101', ...] >>> stray host osd-mirror-4 has 47 stray daemons: >>> ['mds.cephfs.osd-mirror-4.omjlxu', 'mgr.osd-mirror-4', 'osd.103', ...] >>> stray host osd-mirror-5 has 46 stray daemons: ['mgr.osd-mirror-5', >>> 'osd.139', ...] >>> stray host osd-mirror-6 has 46 stray daemons: >>> ['mds.cephfs.osd-mirror-6.hobjsy', 'osd.141', ...] >>> >>> It all seems to boil down to host names from `ceph orch host ls` not >>> matching with other configurations. >>> >>> ceph orch host ls >>> HOST ADDR LABELS STATUS >>> osd-mirror-1.our.domain.org 172.16.62.22 rgw osd >>> osd-mirror-2.our.domain.org 172.16.62.23 rgw osd >>> osd-mirror-3.our.domain.org 172.16.62.24 rgw osd >>> osd-mirror-4.our.domain.org 172.16.62.25 rgw mds osd >>> osd-mirror-5.our.domain.org 172.16.62.32 rgw osd >>> osd-mirror-6.our.domain.org 172.16.62.33 rgw mds osd >>> >>> hostname >>> osd-mirror-6 >>> >>> hostname -f >>> osd-mirror-6.our.domain.org >>> >>> 0|0[root@osd-mirror-6 ~]# ceph mon metadata | grep "\"hostname\"" >>> "hostname": "osd-mirror-1", >>> "hostname": "osd-mirror-3", >>> "hostname": "osd-mirror-2", >>> >>> 0|1[root@osd-mirror-6 ~]# ceph mgr metadata | grep "\"hostname\"" >>> "hostname": "osd-mirror-1", >>> "hostname": "osd-mirror-3", >>> "hostname": "osd-mirror-4", >>> "hostname": "osd-mirror-5", >>> >>> >>> The documentation states, that "cephadm demands that the name of host >>> given via `ceph orch host add` equals the output of `hostname` on remote >>> hosts.". >>> >>> >>> https://docs.ceph.com/en/latest/cephadm/host-management/#fully-qualified-domain-names-vs-bare-host-names >>> >>> >>> https://docs.ceph.com/en/octopus/cephadm/concepts/?#fully-qualified-domain-names-vs-bare-host-names >>> >>> But it seems our cluster wasn't setup like this. >>> >>> How can I now change the host names which were assigend when adding the >>> hosts with `ceph orch host add HOSTNAME`? >>> >>> I can't seem to find any documentation on changing the host names which >>> are listed by `ceph orch host ls`. >>> All I can find is related to changing the actual name of the host in the >>> system. >>> The crush map also just contains the bare host names. >>> So, where are these FQDN names actually registered? >>> >>> Thank you for help. >>> >>> Best regards, >>> Mathias >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>> >> -- >> Mathias Kuhring >> >> Dr. rer. nat. >> Bioinformatician >> HPC & Core Unit Bioinformatics >> Berlin Institute of Health at Charité (BIH) >> >> E-Mail: mathias.kuhring@xxxxxxxxxxxxxx >> Mobile: +49 172 3475576 >> >> -- > Mathias Kuhring > > Dr. rer. nat. > Bioinformatician > HPC & Core Unit Bioinformatics > Berlin Institute of Health at Charité (BIH) > > E-Mail: mathias.kuhring@xxxxxxxxxxxxxx > Mobile: +49 172 3475576 > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx