I don't think the number of mons should have any effect on this. Looking at your logs, the interesting thing is that all the messages are so close together. Was this before having stopped the upgrade? On Fri, Sep 2, 2022 at 2:53 PM Satish Patel <satish.txt@xxxxxxxxx> wrote: > Do you think this is because I have only a single MON daemon running? I > have only two nodes. > > On Fri, Sep 2, 2022 at 2:39 PM Satish Patel <satish.txt@xxxxxxxxx> wrote: > >> Adam, >> >> I have enabled debug and my logs flood with the following. I am going to >> try some stuff from your provided mailing list and see.. >> >> root@ceph1:~# tail -f >> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log >> 2022-09-02T18:38:21.754391+0000 mgr.ceph2.huidoh (mgr.344392) 211198 : >> cephadm [DBG] 0 OSDs are scheduled for removal: [] >> 2022-09-02T18:38:21.754519+0000 mgr.ceph2.huidoh (mgr.344392) 211199 : >> cephadm [DBG] Saving [] to store >> 2022-09-02T18:38:21.757155+0000 mgr.ceph2.huidoh (mgr.344392) 211200 : >> cephadm [DBG] refreshing hosts and daemons >> 2022-09-02T18:38:21.758065+0000 mgr.ceph2.huidoh (mgr.344392) 211201 : >> cephadm [DBG] _check_for_strays >> 2022-09-02T18:38:21.758334+0000 mgr.ceph2.huidoh (mgr.344392) 211202 : >> cephadm [DBG] 0 OSDs are scheduled for removal: [] >> 2022-09-02T18:38:21.758455+0000 mgr.ceph2.huidoh (mgr.344392) 211203 : >> cephadm [DBG] Saving [] to store >> 2022-09-02T18:38:21.761001+0000 mgr.ceph2.huidoh (mgr.344392) 211204 : >> cephadm [DBG] refreshing hosts and daemons >> 2022-09-02T18:38:21.762092+0000 mgr.ceph2.huidoh (mgr.344392) 211205 : >> cephadm [DBG] _check_for_strays >> 2022-09-02T18:38:21.762357+0000 mgr.ceph2.huidoh (mgr.344392) 211206 : >> cephadm [DBG] 0 OSDs are scheduled for removal: [] >> 2022-09-02T18:38:21.762480+0000 mgr.ceph2.huidoh (mgr.344392) 211207 : >> cephadm [DBG] Saving [] to store >> >> On Fri, Sep 2, 2022 at 12:17 PM Adam King <adking@xxxxxxxxxx> wrote: >> >>> hmm, okay. It seems like cephadm is stuck in general rather than an >>> issue specific to the upgrade. I'd first make sure the orchestrator isn't >>> paused (just running "ceph orch resume" should be enough, it's idempotent). >>> >>> Beyond that, there was someone else who had an issue with things getting >>> stuck that was resolved in this thread >>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M >>> <https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M> that >>> might be worth a look. >>> >>> If you haven't already, it's possible stopping the upgrade is a good >>> idea, as maybe that's interfering with it getting to the point where it >>> does the redeploy. >>> >>> If none of those help, it might be worth setting the log level to debug >>> and seeing where things are ending up ("ceph config set mgr >>> mgr/cephadm/log_to_cluster_level debug; ceph orch ps --refresh" then >>> waiting a few minutes before running "ceph log last 100 debug cephadm" (not >>> 100% on format of that command, if it fails try just "ceph log last >>> cephadm"). We could maybe get more info on why it's not performing the >>> redeploy from those debug logs. Just remember to set the log level back >>> after 'ceph config set mgr mgr/cephadm/log_to_cluster_level info' as debug >>> logs are quite verbose. >>> >>> On Fri, Sep 2, 2022 at 11:39 AM Satish Patel <satish.txt@xxxxxxxxx> >>> wrote: >>> >>>> Hi Adam, >>>> >>>> As you said, i did following >>>> >>>> $ ceph orch daemon redeploy mgr.ceph1.smfvfd >>>> quay.io/ceph/ceph:v16.2.10 >>>> >>>> Noticed following line in logs but then no activity nothing, still >>>> standby mgr running in older version >>>> >>>> 2022-09-02T15:35:45.753093+0000 mgr.ceph2.huidoh (mgr.344392) 2226 : >>>> cephadm [INF] Schedule redeploy daemon mgr.ceph1.smfvfd >>>> 2022-09-02T15:36:17.279190+0000 mgr.ceph2.huidoh (mgr.344392) 2245 : >>>> cephadm [INF] refreshing ceph2 facts >>>> 2022-09-02T15:36:17.984478+0000 mgr.ceph2.huidoh (mgr.344392) 2246 : >>>> cephadm [INF] refreshing ceph1 facts >>>> 2022-09-02T15:37:17.663730+0000 mgr.ceph2.huidoh (mgr.344392) 2284 : >>>> cephadm [INF] refreshing ceph2 facts >>>> 2022-09-02T15:37:18.386586+0000 mgr.ceph2.huidoh (mgr.344392) 2285 : >>>> cephadm [INF] refreshing ceph1 facts >>>> >>>> I am not seeing any image get downloaded also >>>> >>>> root@ceph1:~# docker image ls >>>> REPOSITORY TAG IMAGE ID CREATED >>>> SIZE >>>> quay.io/ceph/ceph v15 93146564743f 3 weeks >>>> ago 1.2GB >>>> quay.io/ceph/ceph-grafana 8.3.5 dad864ee21e9 4 months >>>> ago 558MB >>>> quay.io/prometheus/prometheus v2.33.4 514e6a882f6e 6 months >>>> ago 204MB >>>> quay.io/prometheus/alertmanager v0.23.0 ba2b418f427c 12 months >>>> ago 57.5MB >>>> quay.io/ceph/ceph-grafana 6.7.4 557c83e11646 13 months >>>> ago 486MB >>>> quay.io/prometheus/prometheus v2.18.1 de242295e225 2 years >>>> ago 140MB >>>> quay.io/prometheus/alertmanager v0.20.0 0881eb8f169f 2 years >>>> ago 52.1MB >>>> quay.io/prometheus/node-exporter v0.18.1 e5a616e4b9cf 3 years >>>> ago 22.9MB >>>> >>>> >>>> On Fri, Sep 2, 2022 at 11:06 AM Adam King <adking@xxxxxxxxxx> wrote: >>>> >>>>> hmm, at this point, maybe we should just try manually upgrading the >>>>> mgr daemons and then move from there. First, just stop the upgrade "ceph >>>>> orch upgrade stop". If you figure out which of the two mgr daemons is the >>>>> standby (it should say which one is active in "ceph -s" output) and then do >>>>> a "ceph orch daemon redeploy <standby-mgr-name> >>>>> quay.io/ceph/ceph:v16.2.10" it should redeploy that specific mgr with >>>>> the new version. You could then do a "ceph mgr fail" to swap which of the >>>>> mgr daemons is active, then do another "ceph orch daemon redeploy >>>>> <standby-mgr-name> quay.io/ceph/ceph:v16.2.10" where the standby is >>>>> now the other mgr still on 15.2.17. Once the mgr daemons are both upgraded >>>>> to the new version, run a "ceph orch redeploy mgr" and then "ceph orch >>>>> upgrade start --image quay.io/ceph/ceph:v16.2.10" and see if it goes >>>>> better. >>>>> >>>>> On Fri, Sep 2, 2022 at 10:36 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>> wrote: >>>>> >>>>>> Hi Adam, >>>>>> >>>>>> I run the following command to upgrade but it looks like nothing is >>>>>> happening >>>>>> >>>>>> $ ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.10 >>>>>> >>>>>> Status message is empty.. >>>>>> >>>>>> root@ceph1:~# ceph orch upgrade status >>>>>> { >>>>>> "target_image": "quay.io/ceph/ceph:v16.2.10", >>>>>> "in_progress": true, >>>>>> "services_complete": [], >>>>>> "message": "" >>>>>> } >>>>>> >>>>>> Nothing in Logs >>>>>> >>>>>> root@ceph1:~# tail -f >>>>>> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log >>>>>> 2022-09-02T14:31:52.597661+0000 mgr.ceph2.huidoh (mgr.344392) 174 : >>>>>> cephadm [INF] refreshing ceph2 facts >>>>>> 2022-09-02T14:31:52.991450+0000 mgr.ceph2.huidoh (mgr.344392) 176 : >>>>>> cephadm [INF] refreshing ceph1 facts >>>>>> 2022-09-02T14:32:52.965092+0000 mgr.ceph2.huidoh (mgr.344392) 207 : >>>>>> cephadm [INF] refreshing ceph2 facts >>>>>> 2022-09-02T14:32:53.369789+0000 mgr.ceph2.huidoh (mgr.344392) 208 : >>>>>> cephadm [INF] refreshing ceph1 facts >>>>>> 2022-09-02T14:33:53.367986+0000 mgr.ceph2.huidoh (mgr.344392) 239 : >>>>>> cephadm [INF] refreshing ceph2 facts >>>>>> 2022-09-02T14:33:53.760427+0000 mgr.ceph2.huidoh (mgr.344392) 240 : >>>>>> cephadm [INF] refreshing ceph1 facts >>>>>> 2022-09-02T14:34:53.754277+0000 mgr.ceph2.huidoh (mgr.344392) 272 : >>>>>> cephadm [INF] refreshing ceph2 facts >>>>>> 2022-09-02T14:34:54.162503+0000 mgr.ceph2.huidoh (mgr.344392) 273 : >>>>>> cephadm [INF] refreshing ceph1 facts >>>>>> 2022-09-02T14:35:54.133467+0000 mgr.ceph2.huidoh (mgr.344392) 305 : >>>>>> cephadm [INF] refreshing ceph2 facts >>>>>> 2022-09-02T14:35:54.522171+0000 mgr.ceph2.huidoh (mgr.344392) 306 : >>>>>> cephadm [INF] refreshing ceph1 facts >>>>>> >>>>>> In progress that mesg stuck there for long time >>>>>> >>>>>> root@ceph1:~# ceph -s >>>>>> cluster: >>>>>> id: f270ad9e-1f6f-11ed-b6f8-a539d87379ea >>>>>> health: HEALTH_OK >>>>>> >>>>>> services: >>>>>> mon: 1 daemons, quorum ceph1 (age 9h) >>>>>> mgr: ceph2.huidoh(active, since 9m), standbys: ceph1.smfvfd >>>>>> osd: 4 osds: 4 up (since 9h), 4 in (since 11h) >>>>>> >>>>>> data: >>>>>> pools: 5 pools, 129 pgs >>>>>> objects: 20.06k objects, 83 GiB >>>>>> usage: 168 GiB used, 632 GiB / 800 GiB avail >>>>>> pgs: 129 active+clean >>>>>> >>>>>> io: >>>>>> client: 12 KiB/s wr, 0 op/s rd, 1 op/s wr >>>>>> >>>>>> progress: >>>>>> Upgrade to quay.io/ceph/ceph:v16.2.10 (0s) >>>>>> [............................] >>>>>> >>>>>> On Fri, Sep 2, 2022 at 10:25 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>> wrote: >>>>>> >>>>>>> It Looks like I did it with the following command. >>>>>>> >>>>>>> $ ceph orch daemon add mgr ceph2:10.73.0.192 >>>>>>> >>>>>>> Now i can see two with same version 15.x >>>>>>> >>>>>>> root@ceph1:~# ceph orch ps --daemon-type mgr >>>>>>> NAME HOST STATUS REFRESHED AGE VERSION >>>>>>> IMAGE NAME >>>>>>> IMAGE ID CONTAINER ID >>>>>>> mgr.ceph1.smfvfd ceph1 running (8h) 41s ago 8h 15.2.17 >>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>>>> 93146564743f 1aab837306d2 >>>>>>> mgr.ceph2.huidoh ceph2 running (60s) 110s ago 60s 15.2.17 >>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>>>> 93146564743f 294fd6ab6c97 >>>>>>> >>>>>>> On Fri, Sep 2, 2022 at 10:19 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>> wrote: >>>>>>> >>>>>>>> Let's come back to the original question: how to bring back the >>>>>>>> second mgr? >>>>>>>> >>>>>>>> root@ceph1:~# ceph orch apply mgr 2 >>>>>>>> Scheduled mgr update... >>>>>>>> >>>>>>>> Nothing happened with above command, logs saying nothing >>>>>>>> >>>>>>>> 2022-09-02T14:16:20.407927+0000 mgr.ceph1.smfvfd (mgr.334626) 16939 >>>>>>>> : cephadm [INF] refreshing ceph2 facts >>>>>>>> 2022-09-02T14:16:40.247195+0000 mgr.ceph1.smfvfd (mgr.334626) 16952 >>>>>>>> : cephadm [INF] Saving service mgr spec with placement count:2 >>>>>>>> 2022-09-02T14:16:53.106919+0000 mgr.ceph1.smfvfd (mgr.334626) 16961 >>>>>>>> : cephadm [INF] Saving service mgr spec with placement count:2 >>>>>>>> 2022-09-02T14:17:19.135203+0000 mgr.ceph1.smfvfd (mgr.334626) 16975 >>>>>>>> : cephadm [INF] refreshing ceph1 facts >>>>>>>> 2022-09-02T14:17:20.780496+0000 mgr.ceph1.smfvfd (mgr.334626) 16977 >>>>>>>> : cephadm [INF] refreshing ceph2 facts >>>>>>>> 2022-09-02T14:18:19.502034+0000 mgr.ceph1.smfvfd (mgr.334626) 17008 >>>>>>>> : cephadm [INF] refreshing ceph1 facts >>>>>>>> 2022-09-02T14:18:21.127973+0000 mgr.ceph1.smfvfd (mgr.334626) 17010 >>>>>>>> : cephadm [INF] refreshing ceph2 facts >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Sep 2, 2022 at 10:15 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Adam, >>>>>>>>> >>>>>>>>> Wait..wait.. now it's working suddenly without doing anything.. >>>>>>>>> very odd >>>>>>>>> >>>>>>>>> root@ceph1:~# ceph orch ls >>>>>>>>> NAME RUNNING REFRESHED AGE PLACEMENT IMAGE >>>>>>>>> NAME >>>>>>>>> IMAGE ID >>>>>>>>> alertmanager 1/1 5s ago 2w count:1 >>>>>>>>> quay.io/prometheus/alertmanager:v0.20.0 >>>>>>>>> 0881eb8f169f >>>>>>>>> crash 2/2 5s ago 2w * >>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>> 93146564743f >>>>>>>>> grafana 1/1 5s ago 2w count:1 >>>>>>>>> quay.io/ceph/ceph-grafana:6.7.4 >>>>>>>>> 557c83e11646 >>>>>>>>> mgr 1/2 5s ago 8h count:2 >>>>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>>>>>> 93146564743f >>>>>>>>> mon 1/2 5s ago 8h ceph1;ceph2 >>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>> 93146564743f >>>>>>>>> node-exporter 2/2 5s ago 2w * >>>>>>>>> quay.io/prometheus/node-exporter:v0.18.1 >>>>>>>>> e5a616e4b9cf >>>>>>>>> osd.osd_spec_default 4/0 5s ago - <unmanaged> >>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>> 93146564743f >>>>>>>>> prometheus 1/1 5s ago 2w count:1 >>>>>>>>> quay.io/prometheus/prometheus:v2.18.1 >>>>>>>>> >>>>>>>>> On Fri, Sep 2, 2022 at 10:13 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I can see that in the output but I'm not sure how to get rid of >>>>>>>>>> it. >>>>>>>>>> >>>>>>>>>> root@ceph1:~# ceph orch ps --refresh >>>>>>>>>> NAME >>>>>>>>>> HOST STATUS REFRESHED AGE VERSION IMAGE NAME >>>>>>>>>> >>>>>>>>>> IMAGE ID CONTAINER ID >>>>>>>>>> alertmanager.ceph1 >>>>>>>>>> ceph1 running (9h) 64s ago 2w 0.20.0 >>>>>>>>>> quay.io/prometheus/alertmanager:v0.20.0 >>>>>>>>>> 0881eb8f169f ba804b555378 >>>>>>>>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>>> ceph2 stopped 65s ago - <unknown> <unknown> >>>>>>>>>> <unknown> >>>>>>>>>> <unknown> >>>>>>>>>> crash.ceph1 >>>>>>>>>> ceph1 running (9h) 64s ago 2w 15.2.17 >>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>> 93146564743f a3a431d834fc >>>>>>>>>> crash.ceph2 >>>>>>>>>> ceph2 running (9h) 65s ago 13d 15.2.17 >>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>> 93146564743f 3c963693ff2b >>>>>>>>>> grafana.ceph1 >>>>>>>>>> ceph1 running (9h) 64s ago 2w 6.7.4 >>>>>>>>>> quay.io/ceph/ceph-grafana:6.7.4 >>>>>>>>>> 557c83e11646 7583a8dc4c61 >>>>>>>>>> mgr.ceph1.smfvfd >>>>>>>>>> ceph1 running (8h) 64s ago 8h 15.2.17 >>>>>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>>>>>>> 93146564743f 1aab837306d2 >>>>>>>>>> mon.ceph1 >>>>>>>>>> ceph1 running (9h) 64s ago 2w 15.2.17 >>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>> 93146564743f c1d155d8c7ad >>>>>>>>>> node-exporter.ceph1 >>>>>>>>>> ceph1 running (9h) 64s ago 2w 0.18.1 >>>>>>>>>> quay.io/prometheus/node-exporter:v0.18.1 >>>>>>>>>> e5a616e4b9cf 2ff235fe0e42 >>>>>>>>>> node-exporter.ceph2 >>>>>>>>>> ceph2 running (9h) 65s ago 13d 0.18.1 >>>>>>>>>> quay.io/prometheus/node-exporter:v0.18.1 >>>>>>>>>> e5a616e4b9cf 17678b9ba602 >>>>>>>>>> osd.0 >>>>>>>>>> ceph1 running (9h) 64s ago 13d 15.2.17 >>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>> 93146564743f d0fd73b777a3 >>>>>>>>>> osd.1 >>>>>>>>>> ceph1 running (9h) 64s ago 13d 15.2.17 >>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>> 93146564743f 049120e83102 >>>>>>>>>> osd.2 >>>>>>>>>> ceph2 running (9h) 65s ago 13d 15.2.17 >>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>> 93146564743f 8700e8cefd1f >>>>>>>>>> osd.3 >>>>>>>>>> ceph2 running (9h) 65s ago 13d 15.2.17 >>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>> 93146564743f 9c71bc87ed16 >>>>>>>>>> prometheus.ceph1 >>>>>>>>>> ceph1 running (9h) 64s ago 2w 2.18.1 >>>>>>>>>> quay.io/prometheus/prometheus:v2.18.1 >>>>>>>>>> de242295e225 74a538efd61e >>>>>>>>>> >>>>>>>>>> On Fri, Sep 2, 2022 at 10:10 AM Adam King <adking@xxxxxxxxxx> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> maybe also a "ceph orch ps --refresh"? It might still have the >>>>>>>>>>> old cached daemon inventory from before you remove the files. >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 2, 2022 at 9:57 AM Satish Patel < >>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Adam, >>>>>>>>>>>> >>>>>>>>>>>> I have deleted file located here - rm >>>>>>>>>>>> /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>>>>> >>>>>>>>>>>> But still getting the same error, do i need to do anything >>>>>>>>>>>> else? >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 2, 2022 at 9:51 AM Adam King <adking@xxxxxxxxxx> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Okay, I'm wondering if this is an issue with version mismatch. >>>>>>>>>>>>> Having previously had a 16.2.10 mgr and then now having a 15.2.17 one that >>>>>>>>>>>>> doesn't expect this sort of thing to be present. Either way, I'd think just >>>>>>>>>>>>> deleting this cephadm.7ce656a8721deb5054c37b0cfb9038 >>>>>>>>>>>>> 1522d521dde51fb0c5a2142314d663f63d (and any others like it) file >>>>>>>>>>>>> would be the way forward to get orch ls working again. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 2, 2022 at 9:44 AM Satish Patel < >>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Adam, >>>>>>>>>>>>>> >>>>>>>>>>>>>> In cephadm ls i found the following service but i believe it >>>>>>>>>>>>>> was there before also. >>>>>>>>>>>>>> >>>>>>>>>>>>>> { >>>>>>>>>>>>>> "style": "cephadm:v1", >>>>>>>>>>>>>> "name": >>>>>>>>>>>>>> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d", >>>>>>>>>>>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>>>>>>>>>>> "systemd_unit": >>>>>>>>>>>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>>>>>>> ", >>>>>>>>>>>>>> "enabled": false, >>>>>>>>>>>>>> "state": "stopped", >>>>>>>>>>>>>> "container_id": null, >>>>>>>>>>>>>> "container_image_name": null, >>>>>>>>>>>>>> "container_image_id": null, >>>>>>>>>>>>>> "version": null, >>>>>>>>>>>>>> "started": null, >>>>>>>>>>>>>> "created": null, >>>>>>>>>>>>>> "deployed": null, >>>>>>>>>>>>>> "configured": null >>>>>>>>>>>>>> }, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Look like remove didn't work >>>>>>>>>>>>>> >>>>>>>>>>>>>> root@ceph1:~# ceph orch rm cephadm >>>>>>>>>>>>>> Failed to remove service. <cephadm> was not found. >>>>>>>>>>>>>> >>>>>>>>>>>>>> root@ceph1:~# ceph orch rm >>>>>>>>>>>>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>>>>>>> Failed to remove service. >>>>>>>>>>>>>> <cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d> >>>>>>>>>>>>>> was not found. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 8:27 AM Adam King <adking@xxxxxxxxxx> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> this looks like an old traceback you would get if you ended >>>>>>>>>>>>>>> up with a service type that shouldn't be there somehow. The things I'd >>>>>>>>>>>>>>> probably check are that "cephadm ls" on either host definitely doesn't >>>>>>>>>>>>>>> report and strange things that aren't actually daemons in your cluster such >>>>>>>>>>>>>>> as "cephadm.<hash>". Another thing you could maybe try, as I believe the >>>>>>>>>>>>>>> assertion it's giving is for an unknown service type here ("AssertionError: >>>>>>>>>>>>>>> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to >>>>>>>>>>>>>>> remove whatever it thinks is this "cephadm" service that it has deployed. >>>>>>>>>>>>>>> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one >>>>>>>>>>>>>>> instead of 15.2.17 (I'm assuming here, but the line numbers in that >>>>>>>>>>>>>>> traceback suggest octopus). The 16.2.10 one is just much less likely to >>>>>>>>>>>>>>> have a bug that causes something like this. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel < >>>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Now when I run "ceph orch ps" it works but the following >>>>>>>>>>>>>>>> command throws an >>>>>>>>>>>>>>>> error. Trying to bring up second mgr using ceph orch apply >>>>>>>>>>>>>>>> mgr command but >>>>>>>>>>>>>>>> didn't help >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> root@ceph1:/ceph-disk# ceph version >>>>>>>>>>>>>>>> ceph version 15.2.17 >>>>>>>>>>>>>>>> (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus >>>>>>>>>>>>>>>> (stable) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> root@ceph1:/ceph-disk# ceph orch ls >>>>>>>>>>>>>>>> Error EINVAL: Traceback (most recent call last): >>>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in >>>>>>>>>>>>>>>> _handle_command >>>>>>>>>>>>>>>> return self.handle_command(inbuf, cmd) >>>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", >>>>>>>>>>>>>>>> line 140, in >>>>>>>>>>>>>>>> handle_command >>>>>>>>>>>>>>>> return dispatch[cmd['prefix']].call(self, cmd, inbuf) >>>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/mgr_module.py", line 320, in >>>>>>>>>>>>>>>> call >>>>>>>>>>>>>>>> return self.func(mgr, **kwargs) >>>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", >>>>>>>>>>>>>>>> line 102, in >>>>>>>>>>>>>>>> <lambda> >>>>>>>>>>>>>>>> wrapper_copy = lambda *l_args, **l_kwargs: >>>>>>>>>>>>>>>> wrapper(*l_args, **l_kwargs) >>>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", >>>>>>>>>>>>>>>> line 91, in wrapper >>>>>>>>>>>>>>>> return func(*args, **kwargs) >>>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/module.py", line >>>>>>>>>>>>>>>> 503, in >>>>>>>>>>>>>>>> _list_services >>>>>>>>>>>>>>>> raise_if_exception(completion) >>>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", >>>>>>>>>>>>>>>> line 642, in >>>>>>>>>>>>>>>> raise_if_exception >>>>>>>>>>>>>>>> raise e >>>>>>>>>>>>>>>> AssertionError: cephadm >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel < >>>>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> > nevermind, i found doc related that and i am able to get >>>>>>>>>>>>>>>> 1 mgr up - >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel < >>>>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> >> Folks, >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> I am having little fun time with cephadm and it's very >>>>>>>>>>>>>>>> annoying to deal >>>>>>>>>>>>>>>> >> with it >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> I have deployed a ceph cluster using cephadm on two >>>>>>>>>>>>>>>> nodes. Now when i was >>>>>>>>>>>>>>>> >> trying to upgrade and noticed hiccups where it just >>>>>>>>>>>>>>>> upgraded a single mgr >>>>>>>>>>>>>>>> >> with 16.2.10 but not other so i started messing around >>>>>>>>>>>>>>>> and somehow I >>>>>>>>>>>>>>>> >> deleted both mgr in the thought that cephadm will >>>>>>>>>>>>>>>> recreate them. >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> Now i don't have any single mgr so my ceph orch command >>>>>>>>>>>>>>>> hangs forever and >>>>>>>>>>>>>>>> >> looks like a chicken egg issue. >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> How do I recover from this? If I can't run the ceph orch >>>>>>>>>>>>>>>> command, I won't >>>>>>>>>>>>>>>> >> be able to redeploy my mgr daemons. >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> I am not able to find any mgr in the following command >>>>>>>>>>>>>>>> on both nodes. >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> $ cephadm ls | grep mgr >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>>>>>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx