Hi Adam, I have also noticed a very strange thing which is Duplicate name in the following output. Is this normal? I don't know how it got here. Is there a way I can rename them? root@ceph1:~# ceph orch ps NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID alertmanager.ceph1 ceph1 *:9093,9094 starting - - - - <unknown> <unknown> <unknown> crash.ceph2 ceph1 running (13d) 10s ago 13d 10.0M - 15.2.17 93146564743f 0a009254afb0 crash.ceph2 ceph2 running (13d) 10s ago 13d 10.0M - 15.2.17 93146564743f 0a009254afb0 grafana.ceph1 ceph1 *:3000 starting - - - - <unknown> <unknown> <unknown> mgr.ceph2.hmbdla ceph1 running (103m) 10s ago 13d 518M - 16.2.10 0d668911f040 745245c18d5e mgr.ceph2.hmbdla ceph2 running (103m) 10s ago 13d 518M - 16.2.10 0d668911f040 745245c18d5e node-exporter.ceph2 ceph1 running (7h) 10s ago 13d 70.2M - 0.18.1 e5a616e4b9cf d0ba04bb977c node-exporter.ceph2 ceph2 running (7h) 10s ago 13d 70.2M - 0.18.1 e5a616e4b9cf d0ba04bb977c osd.2 ceph1 running (19h) 10s ago 13d 901M 4096M 15.2.17 93146564743f e286fb1c6302 osd.2 ceph2 running (19h) 10s ago 13d 901M 4096M 15.2.17 93146564743f e286fb1c6302 osd.3 ceph1 running (19h) 10s ago 13d 1006M 4096M 15.2.17 93146564743f d3ae5d9f694f osd.3 ceph2 running (19h) 10s ago 13d 1006M 4096M 15.2.17 93146564743f d3ae5d9f694f osd.5 ceph1 running (19h) 10s ago 9d 222M 4096M 15.2.17 93146564743f 405068fb474e osd.5 ceph2 running (19h) 10s ago 9d 222M 4096M 15.2.17 93146564743f 405068fb474e prometheus.ceph1 ceph1 *:9095 running (15s) 10s ago 15s 30.6M - 514e6a882f6e 65a0acfed605 prometheus.ceph1 ceph2 *:9095 running (15s) 10s ago 15s 30.6M - 514e6a882f6e 65a0acfed605 I found the following example link which has all different names, how does cephadm decide naming? https://achchusnulchikam.medium.com/deploy-ceph-cluster-with-cephadm-on-centos-8-257b300e7b42 On Thu, Sep 1, 2022 at 6:20 PM Satish Patel <satish.txt@xxxxxxxxx> wrote: > Hi Adam, > > Getting the following error, not sure why it's not able to find it. > > root@ceph1:~# ceph orch daemon redeploy mgr.ceph1.xmbvsb > Error EINVAL: Unable to find mgr.ceph1.xmbvsb daemon(s) > > On Thu, Sep 1, 2022 at 5:57 PM Adam King <adking@xxxxxxxxxx> wrote: > >> what happens if you run `ceph orch daemon redeploy mgr.ceph1.xmbvsb`? >> >> On Thu, Sep 1, 2022 at 5:12 PM Satish Patel <satish.txt@xxxxxxxxx> wrote: >> >>> Hi Adam, >>> >>> Here is requested output >>> >>> root@ceph1:~# ceph health detail >>> HEALTH_WARN 4 stray daemon(s) not managed by cephadm >>> [WRN] CEPHADM_STRAY_DAEMON: 4 stray daemon(s) not managed by cephadm >>> stray daemon mon.ceph1 on host ceph1 not managed by cephadm >>> stray daemon osd.0 on host ceph1 not managed by cephadm >>> stray daemon osd.1 on host ceph1 not managed by cephadm >>> stray daemon osd.4 on host ceph1 not managed by cephadm >>> >>> >>> root@ceph1:~# ceph orch host ls >>> HOST ADDR LABELS STATUS >>> ceph1 10.73.0.192 >>> ceph2 10.73.3.192 _admin >>> 2 hosts in cluster >>> >>> >>> My cephadm ls saying mgr is in error state >>> >>> { >>> "style": "cephadm:v1", >>> "name": "mgr.ceph1.xmbvsb", >>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>> "systemd_unit": >>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@mgr.ceph1.xmbvsb", >>> "enabled": true, >>> "state": "error", >>> "container_id": null, >>> "container_image_name": "quay.io/ceph/ceph:v15", >>> "container_image_id": null, >>> "version": null, >>> "started": null, >>> "created": "2022-09-01T20:59:49.314347Z", >>> "deployed": "2022-09-01T20:59:48.718347Z", >>> "configured": "2022-09-01T20:59:49.314347Z" >>> }, >>> >>> >>> Getting error >>> >>> root@ceph1:~# cephadm unit --fsid f270ad9e-1f6f-11ed-b6f8-a539d87379ea >>> --name mgr.ceph1.xmbvsb start >>> stderr Job for >>> ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@mgr.ceph1.xmbvsb.service >>> failed because the control process exited with error code. >>> stderr See "systemctl status >>> ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@mgr.ceph1.xmbvsb.service" and >>> "journalctl -xe" for details. >>> Traceback (most recent call last): >>> File "/usr/sbin/cephadm", line 6250, in <module> >>> r = args.func() >>> File "/usr/sbin/cephadm", line 1357, in _infer_fsid >>> return func() >>> File "/usr/sbin/cephadm", line 3727, in command_unit >>> call_throws([ >>> File "/usr/sbin/cephadm", line 1119, in call_throws >>> raise RuntimeError('Failed command: %s' % ' '.join(command)) >>> RuntimeError: Failed command: systemctl start >>> ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@mgr.ceph1.xmbvsb >>> >>> >>> How do I remove and re-deploy mgr? >>> >>> On Thu, Sep 1, 2022 at 4:54 PM Adam King <adking@xxxxxxxxxx> wrote: >>> >>>> cephadm deploys the containers with --rm so they will get removed if >>>> you stop them. As for getting the 2nd mgr back, if it still lists the 2nd >>>> one in `ceph orch ps` you should be able to do a `ceph orch daemon redeploy >>>> <mgr-daemon-name>` where <mgr-daemon-name> should match the name given in >>>> the orch ps output for the one that isn't actually up. If it isn't listed >>>> there, given you have a count of 2, cephadm should deploy another one. I do >>>> see in the orch ls output you posted that it says the mgr service has "2/2" >>>> running which implies it believes a 2nd mgr is present (and you would >>>> therefore be able to try the daemon redeploy if that daemon isn't actually >>>> there). >>>> >>>> Is it still reporting the duplicate osds in orch ps? I see in the >>>> cephadm ls output on ceph1 that osd.2 isn't being reported, which was >>>> reported as being on ceph1 in the orch ps output in your original message >>>> in this thread. I'm interested in what `ceph health detail` is reporting >>>> now as well, as it says there are 4 stray daemons. Also, the `ceph orch >>>> host ls` output just to get a better grasp of the topology of this cluster. >>>> >>>> On Thu, Sep 1, 2022 at 3:50 PM Satish Patel <satish.txt@xxxxxxxxx> >>>> wrote: >>>> >>>>> Adam, >>>>> >>>>> I have posted a question related to upgrading earlier and this thread >>>>> is related to that, I have opened a new one because I found that error in >>>>> logs and thought the upgrade may be stuck because of duplicate OSDs. >>>>> >>>>> root@ceph1:~# ls -l >>>>> /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ >>>>> total 44 >>>>> drwx------ 3 nobody nogroup 4096 Aug 19 05:37 alertmanager.ceph1 >>>>> drwx------ 3 167 167 4096 Aug 19 05:36 crash >>>>> drwx------ 2 167 167 4096 Aug 19 05:37 crash.ceph1 >>>>> drwx------ 4 998 996 4096 Aug 19 05:37 grafana.ceph1 >>>>> drwx------ 2 167 167 4096 Aug 19 05:36 mgr.ceph1.xmbvsb >>>>> drwx------ 3 167 167 4096 Aug 19 05:36 mon.ceph1 >>>>> drwx------ 2 nobody nogroup 4096 Aug 19 05:37 node-exporter.ceph1 >>>>> drwx------ 2 167 167 4096 Aug 19 17:55 osd.0 >>>>> drwx------ 2 167 167 4096 Aug 19 18:03 osd.1 >>>>> drwx------ 2 167 167 4096 Aug 31 05:20 osd.4 >>>>> drwx------ 4 nobody nogroup 4096 Aug 19 05:38 prometheus.ceph1 >>>>> >>>>> Here is the output of cephadm ls >>>>> >>>>> root@ceph1:~# cephadm ls >>>>> [ >>>>> { >>>>> "style": "cephadm:v1", >>>>> "name": "alertmanager.ceph1", >>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>> "systemd_unit": >>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@alertmanager.ceph1", >>>>> "enabled": true, >>>>> "state": "running", >>>>> "container_id": >>>>> "97403cf9799711461216b7f83e88c574da2b631c7c65233ebd82d8a216a48924", >>>>> "container_image_name": " >>>>> quay.io/prometheus/alertmanager:v0.20.0", >>>>> "container_image_id": >>>>> "0881eb8f169f5556a292b4e2c01d683172b12830a62a9225a98a8e206bb734f0", >>>>> "version": "0.20.0", >>>>> "started": "2022-08-19T16:59:02.461978Z", >>>>> "created": "2022-08-19T03:37:16.403605Z", >>>>> "deployed": "2022-08-19T03:37:15.815605Z", >>>>> "configured": "2022-08-19T16:59:02.117607Z" >>>>> }, >>>>> { >>>>> "style": "cephadm:v1", >>>>> "name": "grafana.ceph1", >>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>> "systemd_unit": >>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@grafana.ceph1", >>>>> "enabled": true, >>>>> "state": "running", >>>>> "container_id": >>>>> "c7136aea8349a37dd9b320acd926c4bcbed95bc4549779e9580ed4290edc2117", >>>>> "container_image_name": "quay.io/ceph/ceph-grafana:6.7.4", >>>>> "container_image_id": >>>>> "557c83e11646f123a27b5e4b62ac6c45e7bb8b2e90d6044034d0db5b7019415c", >>>>> "version": "6.7.4", >>>>> "started": "2022-08-19T03:38:05.481992Z", >>>>> "created": "2022-08-19T03:37:46.823604Z", >>>>> "deployed": "2022-08-19T03:37:46.239604Z", >>>>> "configured": "2022-08-19T03:38:05.163603Z" >>>>> }, >>>>> { >>>>> "style": "cephadm:v1", >>>>> "name": "osd.1", >>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>> "systemd_unit": >>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@osd.1", >>>>> "enabled": true, >>>>> "state": "running", >>>>> "container_id": >>>>> "51586b775bda0485c8b27b8401ac2430570e6f42cb7e12bae3eea05064f1fd20", >>>>> "container_image_name": "quay.io/ceph/ceph:v15", >>>>> "container_image_id": >>>>> "93146564743febec815d6a764dad93fc07ce971e88315403ac508cb5da6d35f4", >>>>> "version": "15.2.17", >>>>> "started": "2022-08-19T16:03:10.612432Z", >>>>> "created": "2022-08-19T16:03:09.765746Z", >>>>> "deployed": "2022-08-19T16:03:09.141746Z", >>>>> "configured": "2022-08-31T02:53:34.224643Z" >>>>> }, >>>>> { >>>>> "style": "cephadm:v1", >>>>> "name": "prometheus.ceph1", >>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>> "systemd_unit": >>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@prometheus.ceph1", >>>>> "enabled": true, >>>>> "state": "running", >>>>> "container_id": >>>>> "ba305236e5db9f2095b23b86a2340924909e9e8e54e5cdbe1d51c14dc4c8587a", >>>>> "container_image_name": "quay.io/prometheus/prometheus:v2.18.1 >>>>> ", >>>>> "container_image_id": >>>>> "de242295e2257c37c8cadfd962369228f8f10b2d48a44259b65fef44ad4f6490", >>>>> "version": "2.18.1", >>>>> "started": "2022-08-19T16:59:03.538981Z", >>>>> "created": "2022-08-19T03:38:01.567604Z", >>>>> "deployed": "2022-08-19T03:38:00.983603Z", >>>>> "configured": "2022-08-19T16:59:03.193607Z" >>>>> }, >>>>> { >>>>> "style": "cephadm:v1", >>>>> "name": "node-exporter.ceph1", >>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>> "systemd_unit": >>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@node-exporter.ceph1", >>>>> "enabled": true, >>>>> "state": "running", >>>>> "container_id": >>>>> "00bf3ad29cce79e905e8533648ef38cbd232990fa9616aff1c0020b7b66d0cc0", >>>>> "container_image_name": " >>>>> quay.io/prometheus/node-exporter:v0.18.1", >>>>> "container_image_id": >>>>> "e5a616e4b9cf68dfcad7782b78e118be4310022e874d52da85c55923fb615f87", >>>>> "version": "0.18.1", >>>>> "started": "2022-08-19T03:37:55.232032Z", >>>>> "created": "2022-08-19T03:37:47.711604Z", >>>>> "deployed": "2022-08-19T03:37:47.155604Z", >>>>> "configured": "2022-08-19T03:37:47.711604Z" >>>>> }, >>>>> { >>>>> "style": "cephadm:v1", >>>>> "name": "osd.0", >>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>> "systemd_unit": >>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@osd.0", >>>>> "enabled": true, >>>>> "state": "running", >>>>> "container_id": >>>>> "6b69046972dfbdb53665228258a15b13bc13a462ca4e066a4eca0cd593442d2d", >>>>> "container_image_name": "quay.io/ceph/ceph:v15", >>>>> "container_image_id": >>>>> "93146564743febec815d6a764dad93fc07ce971e88315403ac508cb5da6d35f4", >>>>> "version": "15.2.17", >>>>> "started": "2022-08-19T15:55:20.580157Z", >>>>> "created": "2022-08-19T15:55:19.725766Z", >>>>> "deployed": "2022-08-19T15:55:19.125766Z", >>>>> "configured": "2022-08-31T02:53:34.760643Z" >>>>> }, >>>>> { >>>>> "style": "cephadm:v1", >>>>> "name": "crash.ceph1", >>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>> "systemd_unit": >>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@crash.ceph1", >>>>> "enabled": true, >>>>> "state": "running", >>>>> "container_id": >>>>> "6bc56f478ccb96841fe86a540e284c175300b83dad9e906ae3230f22341c8293", >>>>> "container_image_name": "quay.io/ceph/ceph:v15", >>>>> "container_image_id": >>>>> "93146564743febec815d6a764dad93fc07ce971e88315403ac508cb5da6d35f4", >>>>> "version": "15.2.17", >>>>> "started": "2022-08-19T03:37:17.660080Z", >>>>> "created": "2022-08-19T03:37:17.559605Z", >>>>> "deployed": "2022-08-19T03:37:16.991605Z", >>>>> "configured": "2022-08-19T03:37:17.559605Z" >>>>> }, >>>>> { >>>>> "style": "cephadm:v1", >>>>> "name": "mon.ceph1", >>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>> "systemd_unit": >>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@mon.ceph1", >>>>> "enabled": true, >>>>> "state": "running", >>>>> "container_id": >>>>> "d0f03130491daebbe783c4990c6a4383d49e7a0e2bdf8c5d1eed012865e5d875", >>>>> "container_image_name": "quay.io/ceph/ceph:v15", >>>>> "container_image_id": >>>>> "93146564743febec815d6a764dad93fc07ce971e88315403ac508cb5da6d35f4", >>>>> "version": "15.2.17", >>>>> "started": "2022-08-19T03:36:21.804129Z", >>>>> "created": "2022-08-19T03:36:19.743608Z", >>>>> "deployed": "2022-08-19T03:36:18.439608Z", >>>>> "configured": "2022-08-19T03:38:05.931603Z" >>>>> }, >>>>> { >>>>> "style": "cephadm:v1", >>>>> "name": "mgr.ceph1.xmbvsb", >>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>> "systemd_unit": >>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@mgr.ceph1.xmbvsb", >>>>> "enabled": true, >>>>> "state": "stopped", >>>>> "container_id": null, >>>>> "container_image_name": "quay.io/ceph/ceph:v15", >>>>> "container_image_id": null, >>>>> "version": null, >>>>> "started": null, >>>>> "created": "2022-08-19T03:36:22.815608Z", >>>>> "deployed": "2022-08-19T03:36:22.239608Z", >>>>> "configured": "2022-08-19T03:38:06.487603Z" >>>>> }, >>>>> { >>>>> "style": "cephadm:v1", >>>>> "name": "osd.4", >>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>> "systemd_unit": >>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@osd.4", >>>>> "enabled": true, >>>>> "state": "running", >>>>> "container_id": >>>>> "938840fe7fd0cb45cc26d077837c9847d7c7a7a68c7e1588d4bb4343c695a071", >>>>> "container_image_name": "quay.io/ceph/ceph:v15", >>>>> "container_image_id": >>>>> "93146564743febec815d6a764dad93fc07ce971e88315403ac508cb5da6d35f4", >>>>> "version": "15.2.17", >>>>> "started": "2022-08-31T03:20:55.416219Z", >>>>> "created": "2022-08-23T21:46:49.458533Z", >>>>> "deployed": "2022-08-23T21:46:48.818533Z", >>>>> "configured": "2022-08-31T02:53:41.196643Z" >>>>> } >>>>> ] >>>>> >>>>> >>>>> I have noticed one more thing, I did docker stop <container_id_of_mgr> >>>>> on ceph1 node and now my mgr container disappeared, I can't see it anywhere >>>>> and not sure how do i bring back mgr because upgrade won't let me do >>>>> anything if i don't have two mgr instance. >>>>> >>>>> root@ceph1:~# ceph -s >>>>> cluster: >>>>> id: f270ad9e-1f6f-11ed-b6f8-a539d87379ea >>>>> health: HEALTH_WARN >>>>> 4 stray daemon(s) not managed by cephadm >>>>> >>>>> services: >>>>> mon: 1 daemons, quorum ceph1 (age 17h) >>>>> mgr: ceph2.hmbdla(active, since 5h) >>>>> osd: 6 osds: 6 up (since 40h), 6 in (since 8d) >>>>> >>>>> data: >>>>> pools: 6 pools, 161 pgs >>>>> objects: 20.59k objects, 85 GiB >>>>> usage: 174 GiB used, 826 GiB / 1000 GiB avail >>>>> pgs: 161 active+clean >>>>> >>>>> io: >>>>> client: 0 B/s rd, 12 KiB/s wr, 0 op/s rd, 2 op/s wr >>>>> >>>>> progress: >>>>> Upgrade to quay.io/ceph/ceph:16.2.10 (0s) >>>>> [............................] >>>>> >>>>> I can see mgr count:2 but not sure how do i bring it back >>>>> >>>>> root@ceph1:~# ceph orch ls >>>>> NAME PORTS RUNNING REFRESHED AGE >>>>> PLACEMENT >>>>> alertmanager ?:9093,9094 1/1 20s ago 13d >>>>> count:1 >>>>> crash 2/2 20s ago 13d * >>>>> grafana ?:3000 1/1 20s ago 13d >>>>> count:1 >>>>> mgr 2/2 20s ago 13d >>>>> count:2 >>>>> mon 0/5 - 13d >>>>> <unmanaged> >>>>> node-exporter ?:9100 2/2 20s ago 13d * >>>>> osd 6 20s ago - >>>>> <unmanaged> >>>>> osd.all-available-devices 0 - 13d * >>>>> osd.osd_spec_default 0 - 8d * >>>>> prometheus ?:9095 1/1 20s ago 13d >>>>> count:1 >>>>> >>>>> On Thu, Sep 1, 2022 at 12:28 PM Adam King <adking@xxxxxxxxxx> wrote: >>>>> >>>>>> Are there any extra directories in /var/lib/ceph or >>>>>> /var/lib/ceph/<fsid> that appear to be for those OSDs on that host? When >>>>>> cephadm builds the info it uses for "ceph orch ps" it's actually scraping >>>>>> those directories. The output of "cephadm ls" on the host with the >>>>>> duplicates could also potentially have some insights. >>>>>> >>>>>> On Thu, Sep 1, 2022 at 12:15 PM Satish Patel <satish.txt@xxxxxxxxx> >>>>>> wrote: >>>>>> >>>>>>> Folks, >>>>>>> >>>>>>> I am playing with cephadm and life was good until I started >>>>>>> upgrading from >>>>>>> octopus to pacific. My upgrade process stuck after upgrading mgr and >>>>>>> in >>>>>>> logs now i can see following error >>>>>>> >>>>>>> root@ceph1:~# ceph log last cephadm >>>>>>> 2022-09-01T14:40:45.739804+0000 mgr.ceph2.hmbdla (mgr.265806) 8 : >>>>>>> cephadm [INF] Deploying daemon grafana.ceph1 on ceph1 >>>>>>> 2022-09-01T14:40:56.115693+0000 mgr.ceph2.hmbdla (mgr.265806) 14 : >>>>>>> cephadm [INF] Deploying daemon prometheus.ceph1 on ceph1 >>>>>>> 2022-09-01T14:41:11.856725+0000 mgr.ceph2.hmbdla (mgr.265806) 25 : >>>>>>> cephadm [INF] Reconfiguring alertmanager.ceph1 (dependencies >>>>>>> changed)... >>>>>>> 2022-09-01T14:41:11.861535+0000 mgr.ceph2.hmbdla (mgr.265806) 26 : >>>>>>> cephadm [INF] Reconfiguring daemon alertmanager.ceph1 on ceph1 >>>>>>> 2022-09-01T14:41:12.927852+0000 mgr.ceph2.hmbdla (mgr.265806) 27 : >>>>>>> cephadm [INF] Reconfiguring grafana.ceph1 (dependencies changed)... >>>>>>> 2022-09-01T14:41:12.940615+0000 mgr.ceph2.hmbdla (mgr.265806) 28 : >>>>>>> cephadm [INF] Reconfiguring daemon grafana.ceph1 on ceph1 >>>>>>> 2022-09-01T14:41:14.056113+0000 mgr.ceph2.hmbdla (mgr.265806) 33 : >>>>>>> cephadm [INF] Found duplicate OSDs: osd.2 in status running on ceph1, >>>>>>> osd.2 in status running on ceph2 >>>>>>> 2022-09-01T14:41:14.056437+0000 mgr.ceph2.hmbdla (mgr.265806) 34 : >>>>>>> cephadm [INF] Found duplicate OSDs: osd.5 in status running on ceph1, >>>>>>> osd.5 in status running on ceph2 >>>>>>> 2022-09-01T14:41:14.056630+0000 mgr.ceph2.hmbdla (mgr.265806) 35 : >>>>>>> cephadm [INF] Found duplicate OSDs: osd.3 in status running on ceph1, >>>>>>> osd.3 in status running on ceph2 >>>>>>> >>>>>>> >>>>>>> Not sure from where duplicate names came and how that happened. In >>>>>>> following output i can't see any duplication >>>>>>> >>>>>>> root@ceph1:~# ceph osd tree >>>>>>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF >>>>>>> -1 0.97656 root default >>>>>>> -3 0.48828 host ceph1 >>>>>>> 4 hdd 0.09769 osd.4 up 1.00000 1.00000 >>>>>>> 0 ssd 0.19530 osd.0 up 1.00000 1.00000 >>>>>>> 1 ssd 0.19530 osd.1 up 1.00000 1.00000 >>>>>>> -5 0.48828 host ceph2 >>>>>>> 5 hdd 0.09769 osd.5 up 1.00000 1.00000 >>>>>>> 2 ssd 0.19530 osd.2 up 1.00000 1.00000 >>>>>>> 3 ssd 0.19530 osd.3 up 1.00000 1.00000 >>>>>>> >>>>>>> >>>>>>> But same time i can see duplicate OSD number in ceph1 and ceph2 >>>>>>> >>>>>>> >>>>>>> root@ceph1:~# ceph orch ps >>>>>>> NAME HOST PORTS STATUS REFRESHED >>>>>>> AGE >>>>>>> MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID >>>>>>> alertmanager.ceph1 ceph1 *:9093,9094 running (20s) 2s ago >>>>>>> 20s >>>>>>> 17.1M - ba2b418f427c 856a4fe641f1 >>>>>>> alertmanager.ceph1 ceph2 *:9093,9094 running (20s) 3s ago >>>>>>> 20s >>>>>>> 17.1M - ba2b418f427c 856a4fe641f1 >>>>>>> crash.ceph2 ceph1 running (12d) 2s ago >>>>>>> 12d >>>>>>> 10.0M - 15.2.17 93146564743f 0a009254afb0 >>>>>>> crash.ceph2 ceph2 running (12d) 3s ago >>>>>>> 12d >>>>>>> 10.0M - 15.2.17 93146564743f 0a009254afb0 >>>>>>> grafana.ceph1 ceph1 *:3000 running (18s) 2s ago >>>>>>> 19s >>>>>>> 47.9M - 8.3.5 dad864ee21e9 7d7a70b8ab7f >>>>>>> grafana.ceph1 ceph2 *:3000 running (18s) 3s ago >>>>>>> 19s >>>>>>> 47.9M - 8.3.5 dad864ee21e9 7d7a70b8ab7f >>>>>>> mgr.ceph2.hmbdla ceph1 running (13h) 2s ago >>>>>>> 12d >>>>>>> 506M - 16.2.10 0d668911f040 6274723c35f7 >>>>>>> mgr.ceph2.hmbdla ceph2 running (13h) 3s ago >>>>>>> 12d >>>>>>> 506M - 16.2.10 0d668911f040 6274723c35f7 >>>>>>> node-exporter.ceph2 ceph1 running (91m) 2s ago >>>>>>> 12d >>>>>>> 60.7M - 0.18.1 e5a616e4b9cf d0ba04bb977c >>>>>>> node-exporter.ceph2 ceph2 running (91m) 3s ago >>>>>>> 12d >>>>>>> 60.7M - 0.18.1 e5a616e4b9cf d0ba04bb977c >>>>>>> osd.2 ceph1 running (12h) 2s ago >>>>>>> 12d >>>>>>> 867M 4096M 15.2.17 93146564743f e286fb1c6302 >>>>>>> osd.2 ceph2 running (12h) 3s ago >>>>>>> 12d >>>>>>> 867M 4096M 15.2.17 93146564743f e286fb1c6302 >>>>>>> osd.3 ceph1 running (12h) 2s ago >>>>>>> 12d >>>>>>> 978M 4096M 15.2.17 93146564743f d3ae5d9f694f >>>>>>> osd.3 ceph2 running (12h) 3s ago >>>>>>> 12d >>>>>>> 978M 4096M 15.2.17 93146564743f d3ae5d9f694f >>>>>>> osd.5 ceph1 running (12h) 2s ago >>>>>>> 8d >>>>>>> 225M 4096M 15.2.17 93146564743f 405068fb474e >>>>>>> osd.5 ceph2 running (12h) 3s ago >>>>>>> 8d >>>>>>> 225M 4096M 15.2.17 93146564743f 405068fb474e >>>>>>> prometheus.ceph1 ceph1 *:9095 running (8s) 2s ago >>>>>>> 8s >>>>>>> 30.4M - 514e6a882f6e 9031dbe30cae >>>>>>> prometheus.ceph1 ceph2 *:9095 running (8s) 3s ago >>>>>>> 8s >>>>>>> 30.4M - 514e6a882f6e 9031dbe30cae >>>>>>> >>>>>>> >>>>>>> Is this a bug or did I do something wrong? any workaround to get out >>>>>>> from this condition? >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>>> >>>>>>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx