Adam, I have enabled debug and my logs flood with the following. I am going to try some stuff from your provided mailing list and see.. root@ceph1:~# tail -f /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log 2022-09-02T18:38:21.754391+0000 mgr.ceph2.huidoh (mgr.344392) 211198 : cephadm [DBG] 0 OSDs are scheduled for removal: [] 2022-09-02T18:38:21.754519+0000 mgr.ceph2.huidoh (mgr.344392) 211199 : cephadm [DBG] Saving [] to store 2022-09-02T18:38:21.757155+0000 mgr.ceph2.huidoh (mgr.344392) 211200 : cephadm [DBG] refreshing hosts and daemons 2022-09-02T18:38:21.758065+0000 mgr.ceph2.huidoh (mgr.344392) 211201 : cephadm [DBG] _check_for_strays 2022-09-02T18:38:21.758334+0000 mgr.ceph2.huidoh (mgr.344392) 211202 : cephadm [DBG] 0 OSDs are scheduled for removal: [] 2022-09-02T18:38:21.758455+0000 mgr.ceph2.huidoh (mgr.344392) 211203 : cephadm [DBG] Saving [] to store 2022-09-02T18:38:21.761001+0000 mgr.ceph2.huidoh (mgr.344392) 211204 : cephadm [DBG] refreshing hosts and daemons 2022-09-02T18:38:21.762092+0000 mgr.ceph2.huidoh (mgr.344392) 211205 : cephadm [DBG] _check_for_strays 2022-09-02T18:38:21.762357+0000 mgr.ceph2.huidoh (mgr.344392) 211206 : cephadm [DBG] 0 OSDs are scheduled for removal: [] 2022-09-02T18:38:21.762480+0000 mgr.ceph2.huidoh (mgr.344392) 211207 : cephadm [DBG] Saving [] to store On Fri, Sep 2, 2022 at 12:17 PM Adam King <adking@xxxxxxxxxx> wrote: > hmm, okay. It seems like cephadm is stuck in general rather than an issue > specific to the upgrade. I'd first make sure the orchestrator isn't paused > (just running "ceph orch resume" should be enough, it's idempotent). > > Beyond that, there was someone else who had an issue with things getting > stuck that was resolved in this thread > https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M > <https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M> that > might be worth a look. > > If you haven't already, it's possible stopping the upgrade is a good idea, > as maybe that's interfering with it getting to the point where it does the > redeploy. > > If none of those help, it might be worth setting the log level to debug > and seeing where things are ending up ("ceph config set mgr > mgr/cephadm/log_to_cluster_level debug; ceph orch ps --refresh" then > waiting a few minutes before running "ceph log last 100 debug cephadm" (not > 100% on format of that command, if it fails try just "ceph log last > cephadm"). We could maybe get more info on why it's not performing the > redeploy from those debug logs. Just remember to set the log level back > after 'ceph config set mgr mgr/cephadm/log_to_cluster_level info' as debug > logs are quite verbose. > > On Fri, Sep 2, 2022 at 11:39 AM Satish Patel <satish.txt@xxxxxxxxx> wrote: > >> Hi Adam, >> >> As you said, i did following >> >> $ ceph orch daemon redeploy mgr.ceph1.smfvfd quay.io/ceph/ceph:v16.2.10 >> >> Noticed following line in logs but then no activity nothing, still >> standby mgr running in older version >> >> 2022-09-02T15:35:45.753093+0000 mgr.ceph2.huidoh (mgr.344392) 2226 : >> cephadm [INF] Schedule redeploy daemon mgr.ceph1.smfvfd >> 2022-09-02T15:36:17.279190+0000 mgr.ceph2.huidoh (mgr.344392) 2245 : >> cephadm [INF] refreshing ceph2 facts >> 2022-09-02T15:36:17.984478+0000 mgr.ceph2.huidoh (mgr.344392) 2246 : >> cephadm [INF] refreshing ceph1 facts >> 2022-09-02T15:37:17.663730+0000 mgr.ceph2.huidoh (mgr.344392) 2284 : >> cephadm [INF] refreshing ceph2 facts >> 2022-09-02T15:37:18.386586+0000 mgr.ceph2.huidoh (mgr.344392) 2285 : >> cephadm [INF] refreshing ceph1 facts >> >> I am not seeing any image get downloaded also >> >> root@ceph1:~# docker image ls >> REPOSITORY TAG IMAGE ID CREATED >> SIZE >> quay.io/ceph/ceph v15 93146564743f 3 weeks ago >> 1.2GB >> quay.io/ceph/ceph-grafana 8.3.5 dad864ee21e9 4 months ago >> 558MB >> quay.io/prometheus/prometheus v2.33.4 514e6a882f6e 6 months ago >> 204MB >> quay.io/prometheus/alertmanager v0.23.0 ba2b418f427c 12 months >> ago 57.5MB >> quay.io/ceph/ceph-grafana 6.7.4 557c83e11646 13 months >> ago 486MB >> quay.io/prometheus/prometheus v2.18.1 de242295e225 2 years ago >> 140MB >> quay.io/prometheus/alertmanager v0.20.0 0881eb8f169f 2 years ago >> 52.1MB >> quay.io/prometheus/node-exporter v0.18.1 e5a616e4b9cf 3 years ago >> 22.9MB >> >> >> On Fri, Sep 2, 2022 at 11:06 AM Adam King <adking@xxxxxxxxxx> wrote: >> >>> hmm, at this point, maybe we should just try manually upgrading the mgr >>> daemons and then move from there. First, just stop the upgrade "ceph orch >>> upgrade stop". If you figure out which of the two mgr daemons is the >>> standby (it should say which one is active in "ceph -s" output) and then do >>> a "ceph orch daemon redeploy <standby-mgr-name> >>> quay.io/ceph/ceph:v16.2.10" it should redeploy that specific mgr with >>> the new version. You could then do a "ceph mgr fail" to swap which of the >>> mgr daemons is active, then do another "ceph orch daemon redeploy >>> <standby-mgr-name> quay.io/ceph/ceph:v16.2.10" where the standby is now >>> the other mgr still on 15.2.17. Once the mgr daemons are both upgraded to >>> the new version, run a "ceph orch redeploy mgr" and then "ceph orch upgrade >>> start --image quay.io/ceph/ceph:v16.2.10" and see if it goes better. >>> >>> On Fri, Sep 2, 2022 at 10:36 AM Satish Patel <satish.txt@xxxxxxxxx> >>> wrote: >>> >>>> Hi Adam, >>>> >>>> I run the following command to upgrade but it looks like nothing is >>>> happening >>>> >>>> $ ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.10 >>>> >>>> Status message is empty.. >>>> >>>> root@ceph1:~# ceph orch upgrade status >>>> { >>>> "target_image": "quay.io/ceph/ceph:v16.2.10", >>>> "in_progress": true, >>>> "services_complete": [], >>>> "message": "" >>>> } >>>> >>>> Nothing in Logs >>>> >>>> root@ceph1:~# tail -f >>>> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log >>>> 2022-09-02T14:31:52.597661+0000 mgr.ceph2.huidoh (mgr.344392) 174 : >>>> cephadm [INF] refreshing ceph2 facts >>>> 2022-09-02T14:31:52.991450+0000 mgr.ceph2.huidoh (mgr.344392) 176 : >>>> cephadm [INF] refreshing ceph1 facts >>>> 2022-09-02T14:32:52.965092+0000 mgr.ceph2.huidoh (mgr.344392) 207 : >>>> cephadm [INF] refreshing ceph2 facts >>>> 2022-09-02T14:32:53.369789+0000 mgr.ceph2.huidoh (mgr.344392) 208 : >>>> cephadm [INF] refreshing ceph1 facts >>>> 2022-09-02T14:33:53.367986+0000 mgr.ceph2.huidoh (mgr.344392) 239 : >>>> cephadm [INF] refreshing ceph2 facts >>>> 2022-09-02T14:33:53.760427+0000 mgr.ceph2.huidoh (mgr.344392) 240 : >>>> cephadm [INF] refreshing ceph1 facts >>>> 2022-09-02T14:34:53.754277+0000 mgr.ceph2.huidoh (mgr.344392) 272 : >>>> cephadm [INF] refreshing ceph2 facts >>>> 2022-09-02T14:34:54.162503+0000 mgr.ceph2.huidoh (mgr.344392) 273 : >>>> cephadm [INF] refreshing ceph1 facts >>>> 2022-09-02T14:35:54.133467+0000 mgr.ceph2.huidoh (mgr.344392) 305 : >>>> cephadm [INF] refreshing ceph2 facts >>>> 2022-09-02T14:35:54.522171+0000 mgr.ceph2.huidoh (mgr.344392) 306 : >>>> cephadm [INF] refreshing ceph1 facts >>>> >>>> In progress that mesg stuck there for long time >>>> >>>> root@ceph1:~# ceph -s >>>> cluster: >>>> id: f270ad9e-1f6f-11ed-b6f8-a539d87379ea >>>> health: HEALTH_OK >>>> >>>> services: >>>> mon: 1 daemons, quorum ceph1 (age 9h) >>>> mgr: ceph2.huidoh(active, since 9m), standbys: ceph1.smfvfd >>>> osd: 4 osds: 4 up (since 9h), 4 in (since 11h) >>>> >>>> data: >>>> pools: 5 pools, 129 pgs >>>> objects: 20.06k objects, 83 GiB >>>> usage: 168 GiB used, 632 GiB / 800 GiB avail >>>> pgs: 129 active+clean >>>> >>>> io: >>>> client: 12 KiB/s wr, 0 op/s rd, 1 op/s wr >>>> >>>> progress: >>>> Upgrade to quay.io/ceph/ceph:v16.2.10 (0s) >>>> [............................] >>>> >>>> On Fri, Sep 2, 2022 at 10:25 AM Satish Patel <satish.txt@xxxxxxxxx> >>>> wrote: >>>> >>>>> It Looks like I did it with the following command. >>>>> >>>>> $ ceph orch daemon add mgr ceph2:10.73.0.192 >>>>> >>>>> Now i can see two with same version 15.x >>>>> >>>>> root@ceph1:~# ceph orch ps --daemon-type mgr >>>>> NAME HOST STATUS REFRESHED AGE VERSION IMAGE >>>>> NAME >>>>> IMAGE ID CONTAINER ID >>>>> mgr.ceph1.smfvfd ceph1 running (8h) 41s ago 8h 15.2.17 >>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>> 93146564743f 1aab837306d2 >>>>> mgr.ceph2.huidoh ceph2 running (60s) 110s ago 60s 15.2.17 >>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>> 93146564743f 294fd6ab6c97 >>>>> >>>>> On Fri, Sep 2, 2022 at 10:19 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>> wrote: >>>>> >>>>>> Let's come back to the original question: how to bring back the >>>>>> second mgr? >>>>>> >>>>>> root@ceph1:~# ceph orch apply mgr 2 >>>>>> Scheduled mgr update... >>>>>> >>>>>> Nothing happened with above command, logs saying nothing >>>>>> >>>>>> 2022-09-02T14:16:20.407927+0000 mgr.ceph1.smfvfd (mgr.334626) 16939 : >>>>>> cephadm [INF] refreshing ceph2 facts >>>>>> 2022-09-02T14:16:40.247195+0000 mgr.ceph1.smfvfd (mgr.334626) 16952 : >>>>>> cephadm [INF] Saving service mgr spec with placement count:2 >>>>>> 2022-09-02T14:16:53.106919+0000 mgr.ceph1.smfvfd (mgr.334626) 16961 : >>>>>> cephadm [INF] Saving service mgr spec with placement count:2 >>>>>> 2022-09-02T14:17:19.135203+0000 mgr.ceph1.smfvfd (mgr.334626) 16975 : >>>>>> cephadm [INF] refreshing ceph1 facts >>>>>> 2022-09-02T14:17:20.780496+0000 mgr.ceph1.smfvfd (mgr.334626) 16977 : >>>>>> cephadm [INF] refreshing ceph2 facts >>>>>> 2022-09-02T14:18:19.502034+0000 mgr.ceph1.smfvfd (mgr.334626) 17008 : >>>>>> cephadm [INF] refreshing ceph1 facts >>>>>> 2022-09-02T14:18:21.127973+0000 mgr.ceph1.smfvfd (mgr.334626) 17010 : >>>>>> cephadm [INF] refreshing ceph2 facts >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Sep 2, 2022 at 10:15 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>> wrote: >>>>>> >>>>>>> Hi Adam, >>>>>>> >>>>>>> Wait..wait.. now it's working suddenly without doing anything.. very >>>>>>> odd >>>>>>> >>>>>>> root@ceph1:~# ceph orch ls >>>>>>> NAME RUNNING REFRESHED AGE PLACEMENT IMAGE >>>>>>> NAME >>>>>>> IMAGE ID >>>>>>> alertmanager 1/1 5s ago 2w count:1 >>>>>>> quay.io/prometheus/alertmanager:v0.20.0 >>>>>>> 0881eb8f169f >>>>>>> crash 2/2 5s ago 2w * >>>>>>> quay.io/ceph/ceph:v15 >>>>>>> 93146564743f >>>>>>> grafana 1/1 5s ago 2w count:1 >>>>>>> quay.io/ceph/ceph-grafana:6.7.4 >>>>>>> 557c83e11646 >>>>>>> mgr 1/2 5s ago 8h count:2 >>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>>>> 93146564743f >>>>>>> mon 1/2 5s ago 8h ceph1;ceph2 >>>>>>> quay.io/ceph/ceph:v15 >>>>>>> 93146564743f >>>>>>> node-exporter 2/2 5s ago 2w * >>>>>>> quay.io/prometheus/node-exporter:v0.18.1 >>>>>>> e5a616e4b9cf >>>>>>> osd.osd_spec_default 4/0 5s ago - <unmanaged> >>>>>>> quay.io/ceph/ceph:v15 >>>>>>> 93146564743f >>>>>>> prometheus 1/1 5s ago 2w count:1 >>>>>>> quay.io/prometheus/prometheus:v2.18.1 >>>>>>> >>>>>>> On Fri, Sep 2, 2022 at 10:13 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>> wrote: >>>>>>> >>>>>>>> I can see that in the output but I'm not sure how to get rid of it. >>>>>>>> >>>>>>>> root@ceph1:~# ceph orch ps --refresh >>>>>>>> NAME >>>>>>>> HOST STATUS REFRESHED AGE VERSION IMAGE NAME >>>>>>>> IMAGE >>>>>>>> ID CONTAINER ID >>>>>>>> alertmanager.ceph1 >>>>>>>> ceph1 running (9h) 64s ago 2w 0.20.0 >>>>>>>> quay.io/prometheus/alertmanager:v0.20.0 >>>>>>>> 0881eb8f169f ba804b555378 >>>>>>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>> ceph2 stopped 65s ago - <unknown> <unknown> >>>>>>>> <unknown> >>>>>>>> <unknown> >>>>>>>> crash.ceph1 >>>>>>>> ceph1 running (9h) 64s ago 2w 15.2.17 >>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>> 93146564743f a3a431d834fc >>>>>>>> crash.ceph2 >>>>>>>> ceph2 running (9h) 65s ago 13d 15.2.17 >>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>> 93146564743f 3c963693ff2b >>>>>>>> grafana.ceph1 >>>>>>>> ceph1 running (9h) 64s ago 2w 6.7.4 >>>>>>>> quay.io/ceph/ceph-grafana:6.7.4 >>>>>>>> 557c83e11646 7583a8dc4c61 >>>>>>>> mgr.ceph1.smfvfd >>>>>>>> ceph1 running (8h) 64s ago 8h 15.2.17 >>>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>>>>> 93146564743f 1aab837306d2 >>>>>>>> mon.ceph1 >>>>>>>> ceph1 running (9h) 64s ago 2w 15.2.17 >>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>> 93146564743f c1d155d8c7ad >>>>>>>> node-exporter.ceph1 >>>>>>>> ceph1 running (9h) 64s ago 2w 0.18.1 >>>>>>>> quay.io/prometheus/node-exporter:v0.18.1 >>>>>>>> e5a616e4b9cf 2ff235fe0e42 >>>>>>>> node-exporter.ceph2 >>>>>>>> ceph2 running (9h) 65s ago 13d 0.18.1 >>>>>>>> quay.io/prometheus/node-exporter:v0.18.1 >>>>>>>> e5a616e4b9cf 17678b9ba602 >>>>>>>> osd.0 >>>>>>>> ceph1 running (9h) 64s ago 13d 15.2.17 >>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>> 93146564743f d0fd73b777a3 >>>>>>>> osd.1 >>>>>>>> ceph1 running (9h) 64s ago 13d 15.2.17 >>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>> 93146564743f 049120e83102 >>>>>>>> osd.2 >>>>>>>> ceph2 running (9h) 65s ago 13d 15.2.17 >>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>> 93146564743f 8700e8cefd1f >>>>>>>> osd.3 >>>>>>>> ceph2 running (9h) 65s ago 13d 15.2.17 >>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>> 93146564743f 9c71bc87ed16 >>>>>>>> prometheus.ceph1 >>>>>>>> ceph1 running (9h) 64s ago 2w 2.18.1 >>>>>>>> quay.io/prometheus/prometheus:v2.18.1 >>>>>>>> de242295e225 74a538efd61e >>>>>>>> >>>>>>>> On Fri, Sep 2, 2022 at 10:10 AM Adam King <adking@xxxxxxxxxx> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> maybe also a "ceph orch ps --refresh"? It might still have the old >>>>>>>>> cached daemon inventory from before you remove the files. >>>>>>>>> >>>>>>>>> On Fri, Sep 2, 2022 at 9:57 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Adam, >>>>>>>>>> >>>>>>>>>> I have deleted file located here - rm >>>>>>>>>> /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>>> >>>>>>>>>> But still getting the same error, do i need to do anything else? >>>>>>>>>> >>>>>>>>>> On Fri, Sep 2, 2022 at 9:51 AM Adam King <adking@xxxxxxxxxx> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Okay, I'm wondering if this is an issue with version mismatch. >>>>>>>>>>> Having previously had a 16.2.10 mgr and then now having a 15.2.17 one that >>>>>>>>>>> doesn't expect this sort of thing to be present. Either way, I'd think just >>>>>>>>>>> deleting this cephadm.7ce656a8721deb5054c37b0cfb9038 >>>>>>>>>>> 1522d521dde51fb0c5a2142314d663f63d (and any others like it) file >>>>>>>>>>> would be the way forward to get orch ls working again. >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 2, 2022 at 9:44 AM Satish Patel < >>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Adam, >>>>>>>>>>>> >>>>>>>>>>>> In cephadm ls i found the following service but i believe it >>>>>>>>>>>> was there before also. >>>>>>>>>>>> >>>>>>>>>>>> { >>>>>>>>>>>> "style": "cephadm:v1", >>>>>>>>>>>> "name": >>>>>>>>>>>> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d", >>>>>>>>>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>>>>>>>>> "systemd_unit": >>>>>>>>>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>>>>> ", >>>>>>>>>>>> "enabled": false, >>>>>>>>>>>> "state": "stopped", >>>>>>>>>>>> "container_id": null, >>>>>>>>>>>> "container_image_name": null, >>>>>>>>>>>> "container_image_id": null, >>>>>>>>>>>> "version": null, >>>>>>>>>>>> "started": null, >>>>>>>>>>>> "created": null, >>>>>>>>>>>> "deployed": null, >>>>>>>>>>>> "configured": null >>>>>>>>>>>> }, >>>>>>>>>>>> >>>>>>>>>>>> Look like remove didn't work >>>>>>>>>>>> >>>>>>>>>>>> root@ceph1:~# ceph orch rm cephadm >>>>>>>>>>>> Failed to remove service. <cephadm> was not found. >>>>>>>>>>>> >>>>>>>>>>>> root@ceph1:~# ceph orch rm >>>>>>>>>>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>>>>> Failed to remove service. >>>>>>>>>>>> <cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d> >>>>>>>>>>>> was not found. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 2, 2022 at 8:27 AM Adam King <adking@xxxxxxxxxx> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> this looks like an old traceback you would get if you ended up >>>>>>>>>>>>> with a service type that shouldn't be there somehow. The things I'd >>>>>>>>>>>>> probably check are that "cephadm ls" on either host definitely doesn't >>>>>>>>>>>>> report and strange things that aren't actually daemons in your cluster such >>>>>>>>>>>>> as "cephadm.<hash>". Another thing you could maybe try, as I believe the >>>>>>>>>>>>> assertion it's giving is for an unknown service type here ("AssertionError: >>>>>>>>>>>>> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to >>>>>>>>>>>>> remove whatever it thinks is this "cephadm" service that it has deployed. >>>>>>>>>>>>> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one >>>>>>>>>>>>> instead of 15.2.17 (I'm assuming here, but the line numbers in that >>>>>>>>>>>>> traceback suggest octopus). The 16.2.10 one is just much less likely to >>>>>>>>>>>>> have a bug that causes something like this. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel < >>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Now when I run "ceph orch ps" it works but the following >>>>>>>>>>>>>> command throws an >>>>>>>>>>>>>> error. Trying to bring up second mgr using ceph orch apply >>>>>>>>>>>>>> mgr command but >>>>>>>>>>>>>> didn't help >>>>>>>>>>>>>> >>>>>>>>>>>>>> root@ceph1:/ceph-disk# ceph version >>>>>>>>>>>>>> ceph version 15.2.17 >>>>>>>>>>>>>> (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus >>>>>>>>>>>>>> (stable) >>>>>>>>>>>>>> >>>>>>>>>>>>>> root@ceph1:/ceph-disk# ceph orch ls >>>>>>>>>>>>>> Error EINVAL: Traceback (most recent call last): >>>>>>>>>>>>>> File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in >>>>>>>>>>>>>> _handle_command >>>>>>>>>>>>>> return self.handle_command(inbuf, cmd) >>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line >>>>>>>>>>>>>> 140, in >>>>>>>>>>>>>> handle_command >>>>>>>>>>>>>> return dispatch[cmd['prefix']].call(self, cmd, inbuf) >>>>>>>>>>>>>> File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call >>>>>>>>>>>>>> return self.func(mgr, **kwargs) >>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line >>>>>>>>>>>>>> 102, in >>>>>>>>>>>>>> <lambda> >>>>>>>>>>>>>> wrapper_copy = lambda *l_args, **l_kwargs: >>>>>>>>>>>>>> wrapper(*l_args, **l_kwargs) >>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line >>>>>>>>>>>>>> 91, in wrapper >>>>>>>>>>>>>> return func(*args, **kwargs) >>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/module.py", line >>>>>>>>>>>>>> 503, in >>>>>>>>>>>>>> _list_services >>>>>>>>>>>>>> raise_if_exception(completion) >>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line >>>>>>>>>>>>>> 642, in >>>>>>>>>>>>>> raise_if_exception >>>>>>>>>>>>>> raise e >>>>>>>>>>>>>> AssertionError: cephadm >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel < >>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> > nevermind, i found doc related that and i am able to get 1 >>>>>>>>>>>>>> mgr up - >>>>>>>>>>>>>> > >>>>>>>>>>>>>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel < >>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>>>> > >>>>>>>>>>>>>> >> Folks, >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> I am having little fun time with cephadm and it's very >>>>>>>>>>>>>> annoying to deal >>>>>>>>>>>>>> >> with it >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> I have deployed a ceph cluster using cephadm on two nodes. >>>>>>>>>>>>>> Now when i was >>>>>>>>>>>>>> >> trying to upgrade and noticed hiccups where it just >>>>>>>>>>>>>> upgraded a single mgr >>>>>>>>>>>>>> >> with 16.2.10 but not other so i started messing around and >>>>>>>>>>>>>> somehow I >>>>>>>>>>>>>> >> deleted both mgr in the thought that cephadm will recreate >>>>>>>>>>>>>> them. >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> Now i don't have any single mgr so my ceph orch command >>>>>>>>>>>>>> hangs forever and >>>>>>>>>>>>>> >> looks like a chicken egg issue. >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> How do I recover from this? If I can't run the ceph orch >>>>>>>>>>>>>> command, I won't >>>>>>>>>>>>>> >> be able to redeploy my mgr daemons. >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> I am not able to find any mgr in the following command on >>>>>>>>>>>>>> both nodes. >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> $ cephadm ls | grep mgr >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> > >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>>>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx