Do you think this is because I have only a single MON daemon running? I have only two nodes. On Fri, Sep 2, 2022 at 2:39 PM Satish Patel <satish.txt@xxxxxxxxx> wrote: > Adam, > > I have enabled debug and my logs flood with the following. I am going to > try some stuff from your provided mailing list and see.. > > root@ceph1:~# tail -f > /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log > 2022-09-02T18:38:21.754391+0000 mgr.ceph2.huidoh (mgr.344392) 211198 : > cephadm [DBG] 0 OSDs are scheduled for removal: [] > 2022-09-02T18:38:21.754519+0000 mgr.ceph2.huidoh (mgr.344392) 211199 : > cephadm [DBG] Saving [] to store > 2022-09-02T18:38:21.757155+0000 mgr.ceph2.huidoh (mgr.344392) 211200 : > cephadm [DBG] refreshing hosts and daemons > 2022-09-02T18:38:21.758065+0000 mgr.ceph2.huidoh (mgr.344392) 211201 : > cephadm [DBG] _check_for_strays > 2022-09-02T18:38:21.758334+0000 mgr.ceph2.huidoh (mgr.344392) 211202 : > cephadm [DBG] 0 OSDs are scheduled for removal: [] > 2022-09-02T18:38:21.758455+0000 mgr.ceph2.huidoh (mgr.344392) 211203 : > cephadm [DBG] Saving [] to store > 2022-09-02T18:38:21.761001+0000 mgr.ceph2.huidoh (mgr.344392) 211204 : > cephadm [DBG] refreshing hosts and daemons > 2022-09-02T18:38:21.762092+0000 mgr.ceph2.huidoh (mgr.344392) 211205 : > cephadm [DBG] _check_for_strays > 2022-09-02T18:38:21.762357+0000 mgr.ceph2.huidoh (mgr.344392) 211206 : > cephadm [DBG] 0 OSDs are scheduled for removal: [] > 2022-09-02T18:38:21.762480+0000 mgr.ceph2.huidoh (mgr.344392) 211207 : > cephadm [DBG] Saving [] to store > > On Fri, Sep 2, 2022 at 12:17 PM Adam King <adking@xxxxxxxxxx> wrote: > >> hmm, okay. It seems like cephadm is stuck in general rather than an issue >> specific to the upgrade. I'd first make sure the orchestrator isn't paused >> (just running "ceph orch resume" should be enough, it's idempotent). >> >> Beyond that, there was someone else who had an issue with things getting >> stuck that was resolved in this thread >> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M >> <https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M> that >> might be worth a look. >> >> If you haven't already, it's possible stopping the upgrade is a good >> idea, as maybe that's interfering with it getting to the point where it >> does the redeploy. >> >> If none of those help, it might be worth setting the log level to debug >> and seeing where things are ending up ("ceph config set mgr >> mgr/cephadm/log_to_cluster_level debug; ceph orch ps --refresh" then >> waiting a few minutes before running "ceph log last 100 debug cephadm" (not >> 100% on format of that command, if it fails try just "ceph log last >> cephadm"). We could maybe get more info on why it's not performing the >> redeploy from those debug logs. Just remember to set the log level back >> after 'ceph config set mgr mgr/cephadm/log_to_cluster_level info' as debug >> logs are quite verbose. >> >> On Fri, Sep 2, 2022 at 11:39 AM Satish Patel <satish.txt@xxxxxxxxx> >> wrote: >> >>> Hi Adam, >>> >>> As you said, i did following >>> >>> $ ceph orch daemon redeploy mgr.ceph1.smfvfd quay.io/ceph/ceph:v16.2.10 >>> >>> Noticed following line in logs but then no activity nothing, still >>> standby mgr running in older version >>> >>> 2022-09-02T15:35:45.753093+0000 mgr.ceph2.huidoh (mgr.344392) 2226 : >>> cephadm [INF] Schedule redeploy daemon mgr.ceph1.smfvfd >>> 2022-09-02T15:36:17.279190+0000 mgr.ceph2.huidoh (mgr.344392) 2245 : >>> cephadm [INF] refreshing ceph2 facts >>> 2022-09-02T15:36:17.984478+0000 mgr.ceph2.huidoh (mgr.344392) 2246 : >>> cephadm [INF] refreshing ceph1 facts >>> 2022-09-02T15:37:17.663730+0000 mgr.ceph2.huidoh (mgr.344392) 2284 : >>> cephadm [INF] refreshing ceph2 facts >>> 2022-09-02T15:37:18.386586+0000 mgr.ceph2.huidoh (mgr.344392) 2285 : >>> cephadm [INF] refreshing ceph1 facts >>> >>> I am not seeing any image get downloaded also >>> >>> root@ceph1:~# docker image ls >>> REPOSITORY TAG IMAGE ID CREATED >>> SIZE >>> quay.io/ceph/ceph v15 93146564743f 3 weeks ago >>> 1.2GB >>> quay.io/ceph/ceph-grafana 8.3.5 dad864ee21e9 4 months >>> ago 558MB >>> quay.io/prometheus/prometheus v2.33.4 514e6a882f6e 6 months >>> ago 204MB >>> quay.io/prometheus/alertmanager v0.23.0 ba2b418f427c 12 months >>> ago 57.5MB >>> quay.io/ceph/ceph-grafana 6.7.4 557c83e11646 13 months >>> ago 486MB >>> quay.io/prometheus/prometheus v2.18.1 de242295e225 2 years ago >>> 140MB >>> quay.io/prometheus/alertmanager v0.20.0 0881eb8f169f 2 years ago >>> 52.1MB >>> quay.io/prometheus/node-exporter v0.18.1 e5a616e4b9cf 3 years ago >>> 22.9MB >>> >>> >>> On Fri, Sep 2, 2022 at 11:06 AM Adam King <adking@xxxxxxxxxx> wrote: >>> >>>> hmm, at this point, maybe we should just try manually upgrading the mgr >>>> daemons and then move from there. First, just stop the upgrade "ceph orch >>>> upgrade stop". If you figure out which of the two mgr daemons is the >>>> standby (it should say which one is active in "ceph -s" output) and then do >>>> a "ceph orch daemon redeploy <standby-mgr-name> >>>> quay.io/ceph/ceph:v16.2.10" it should redeploy that specific mgr with >>>> the new version. You could then do a "ceph mgr fail" to swap which of the >>>> mgr daemons is active, then do another "ceph orch daemon redeploy >>>> <standby-mgr-name> quay.io/ceph/ceph:v16.2.10" where the standby is >>>> now the other mgr still on 15.2.17. Once the mgr daemons are both upgraded >>>> to the new version, run a "ceph orch redeploy mgr" and then "ceph orch >>>> upgrade start --image quay.io/ceph/ceph:v16.2.10" and see if it goes >>>> better. >>>> >>>> On Fri, Sep 2, 2022 at 10:36 AM Satish Patel <satish.txt@xxxxxxxxx> >>>> wrote: >>>> >>>>> Hi Adam, >>>>> >>>>> I run the following command to upgrade but it looks like nothing is >>>>> happening >>>>> >>>>> $ ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.10 >>>>> >>>>> Status message is empty.. >>>>> >>>>> root@ceph1:~# ceph orch upgrade status >>>>> { >>>>> "target_image": "quay.io/ceph/ceph:v16.2.10", >>>>> "in_progress": true, >>>>> "services_complete": [], >>>>> "message": "" >>>>> } >>>>> >>>>> Nothing in Logs >>>>> >>>>> root@ceph1:~# tail -f >>>>> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log >>>>> 2022-09-02T14:31:52.597661+0000 mgr.ceph2.huidoh (mgr.344392) 174 : >>>>> cephadm [INF] refreshing ceph2 facts >>>>> 2022-09-02T14:31:52.991450+0000 mgr.ceph2.huidoh (mgr.344392) 176 : >>>>> cephadm [INF] refreshing ceph1 facts >>>>> 2022-09-02T14:32:52.965092+0000 mgr.ceph2.huidoh (mgr.344392) 207 : >>>>> cephadm [INF] refreshing ceph2 facts >>>>> 2022-09-02T14:32:53.369789+0000 mgr.ceph2.huidoh (mgr.344392) 208 : >>>>> cephadm [INF] refreshing ceph1 facts >>>>> 2022-09-02T14:33:53.367986+0000 mgr.ceph2.huidoh (mgr.344392) 239 : >>>>> cephadm [INF] refreshing ceph2 facts >>>>> 2022-09-02T14:33:53.760427+0000 mgr.ceph2.huidoh (mgr.344392) 240 : >>>>> cephadm [INF] refreshing ceph1 facts >>>>> 2022-09-02T14:34:53.754277+0000 mgr.ceph2.huidoh (mgr.344392) 272 : >>>>> cephadm [INF] refreshing ceph2 facts >>>>> 2022-09-02T14:34:54.162503+0000 mgr.ceph2.huidoh (mgr.344392) 273 : >>>>> cephadm [INF] refreshing ceph1 facts >>>>> 2022-09-02T14:35:54.133467+0000 mgr.ceph2.huidoh (mgr.344392) 305 : >>>>> cephadm [INF] refreshing ceph2 facts >>>>> 2022-09-02T14:35:54.522171+0000 mgr.ceph2.huidoh (mgr.344392) 306 : >>>>> cephadm [INF] refreshing ceph1 facts >>>>> >>>>> In progress that mesg stuck there for long time >>>>> >>>>> root@ceph1:~# ceph -s >>>>> cluster: >>>>> id: f270ad9e-1f6f-11ed-b6f8-a539d87379ea >>>>> health: HEALTH_OK >>>>> >>>>> services: >>>>> mon: 1 daemons, quorum ceph1 (age 9h) >>>>> mgr: ceph2.huidoh(active, since 9m), standbys: ceph1.smfvfd >>>>> osd: 4 osds: 4 up (since 9h), 4 in (since 11h) >>>>> >>>>> data: >>>>> pools: 5 pools, 129 pgs >>>>> objects: 20.06k objects, 83 GiB >>>>> usage: 168 GiB used, 632 GiB / 800 GiB avail >>>>> pgs: 129 active+clean >>>>> >>>>> io: >>>>> client: 12 KiB/s wr, 0 op/s rd, 1 op/s wr >>>>> >>>>> progress: >>>>> Upgrade to quay.io/ceph/ceph:v16.2.10 (0s) >>>>> [............................] >>>>> >>>>> On Fri, Sep 2, 2022 at 10:25 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>> wrote: >>>>> >>>>>> It Looks like I did it with the following command. >>>>>> >>>>>> $ ceph orch daemon add mgr ceph2:10.73.0.192 >>>>>> >>>>>> Now i can see two with same version 15.x >>>>>> >>>>>> root@ceph1:~# ceph orch ps --daemon-type mgr >>>>>> NAME HOST STATUS REFRESHED AGE VERSION >>>>>> IMAGE NAME >>>>>> IMAGE ID CONTAINER ID >>>>>> mgr.ceph1.smfvfd ceph1 running (8h) 41s ago 8h 15.2.17 >>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>>> 93146564743f 1aab837306d2 >>>>>> mgr.ceph2.huidoh ceph2 running (60s) 110s ago 60s 15.2.17 >>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>>> 93146564743f 294fd6ab6c97 >>>>>> >>>>>> On Fri, Sep 2, 2022 at 10:19 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>> wrote: >>>>>> >>>>>>> Let's come back to the original question: how to bring back the >>>>>>> second mgr? >>>>>>> >>>>>>> root@ceph1:~# ceph orch apply mgr 2 >>>>>>> Scheduled mgr update... >>>>>>> >>>>>>> Nothing happened with above command, logs saying nothing >>>>>>> >>>>>>> 2022-09-02T14:16:20.407927+0000 mgr.ceph1.smfvfd (mgr.334626) 16939 >>>>>>> : cephadm [INF] refreshing ceph2 facts >>>>>>> 2022-09-02T14:16:40.247195+0000 mgr.ceph1.smfvfd (mgr.334626) 16952 >>>>>>> : cephadm [INF] Saving service mgr spec with placement count:2 >>>>>>> 2022-09-02T14:16:53.106919+0000 mgr.ceph1.smfvfd (mgr.334626) 16961 >>>>>>> : cephadm [INF] Saving service mgr spec with placement count:2 >>>>>>> 2022-09-02T14:17:19.135203+0000 mgr.ceph1.smfvfd (mgr.334626) 16975 >>>>>>> : cephadm [INF] refreshing ceph1 facts >>>>>>> 2022-09-02T14:17:20.780496+0000 mgr.ceph1.smfvfd (mgr.334626) 16977 >>>>>>> : cephadm [INF] refreshing ceph2 facts >>>>>>> 2022-09-02T14:18:19.502034+0000 mgr.ceph1.smfvfd (mgr.334626) 17008 >>>>>>> : cephadm [INF] refreshing ceph1 facts >>>>>>> 2022-09-02T14:18:21.127973+0000 mgr.ceph1.smfvfd (mgr.334626) 17010 >>>>>>> : cephadm [INF] refreshing ceph2 facts >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 2, 2022 at 10:15 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Adam, >>>>>>>> >>>>>>>> Wait..wait.. now it's working suddenly without doing anything.. >>>>>>>> very odd >>>>>>>> >>>>>>>> root@ceph1:~# ceph orch ls >>>>>>>> NAME RUNNING REFRESHED AGE PLACEMENT IMAGE >>>>>>>> NAME >>>>>>>> IMAGE ID >>>>>>>> alertmanager 1/1 5s ago 2w count:1 >>>>>>>> quay.io/prometheus/alertmanager:v0.20.0 >>>>>>>> 0881eb8f169f >>>>>>>> crash 2/2 5s ago 2w * >>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>> 93146564743f >>>>>>>> grafana 1/1 5s ago 2w count:1 >>>>>>>> quay.io/ceph/ceph-grafana:6.7.4 >>>>>>>> 557c83e11646 >>>>>>>> mgr 1/2 5s ago 8h count:2 >>>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>>>>> 93146564743f >>>>>>>> mon 1/2 5s ago 8h ceph1;ceph2 >>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>> 93146564743f >>>>>>>> node-exporter 2/2 5s ago 2w * >>>>>>>> quay.io/prometheus/node-exporter:v0.18.1 >>>>>>>> e5a616e4b9cf >>>>>>>> osd.osd_spec_default 4/0 5s ago - <unmanaged> >>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>> 93146564743f >>>>>>>> prometheus 1/1 5s ago 2w count:1 >>>>>>>> quay.io/prometheus/prometheus:v2.18.1 >>>>>>>> >>>>>>>> On Fri, Sep 2, 2022 at 10:13 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I can see that in the output but I'm not sure how to get rid of >>>>>>>>> it. >>>>>>>>> >>>>>>>>> root@ceph1:~# ceph orch ps --refresh >>>>>>>>> NAME >>>>>>>>> HOST STATUS REFRESHED AGE VERSION IMAGE NAME >>>>>>>>> IMAGE >>>>>>>>> ID CONTAINER ID >>>>>>>>> alertmanager.ceph1 >>>>>>>>> ceph1 running (9h) 64s ago 2w 0.20.0 >>>>>>>>> quay.io/prometheus/alertmanager:v0.20.0 >>>>>>>>> 0881eb8f169f ba804b555378 >>>>>>>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>> ceph2 stopped 65s ago - <unknown> <unknown> >>>>>>>>> <unknown> >>>>>>>>> <unknown> >>>>>>>>> crash.ceph1 >>>>>>>>> ceph1 running (9h) 64s ago 2w 15.2.17 >>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>> 93146564743f a3a431d834fc >>>>>>>>> crash.ceph2 >>>>>>>>> ceph2 running (9h) 65s ago 13d 15.2.17 >>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>> 93146564743f 3c963693ff2b >>>>>>>>> grafana.ceph1 >>>>>>>>> ceph1 running (9h) 64s ago 2w 6.7.4 >>>>>>>>> quay.io/ceph/ceph-grafana:6.7.4 >>>>>>>>> 557c83e11646 7583a8dc4c61 >>>>>>>>> mgr.ceph1.smfvfd >>>>>>>>> ceph1 running (8h) 64s ago 8h 15.2.17 >>>>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>>>>>> 93146564743f 1aab837306d2 >>>>>>>>> mon.ceph1 >>>>>>>>> ceph1 running (9h) 64s ago 2w 15.2.17 >>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>> 93146564743f c1d155d8c7ad >>>>>>>>> node-exporter.ceph1 >>>>>>>>> ceph1 running (9h) 64s ago 2w 0.18.1 >>>>>>>>> quay.io/prometheus/node-exporter:v0.18.1 >>>>>>>>> e5a616e4b9cf 2ff235fe0e42 >>>>>>>>> node-exporter.ceph2 >>>>>>>>> ceph2 running (9h) 65s ago 13d 0.18.1 >>>>>>>>> quay.io/prometheus/node-exporter:v0.18.1 >>>>>>>>> e5a616e4b9cf 17678b9ba602 >>>>>>>>> osd.0 >>>>>>>>> ceph1 running (9h) 64s ago 13d 15.2.17 >>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>> 93146564743f d0fd73b777a3 >>>>>>>>> osd.1 >>>>>>>>> ceph1 running (9h) 64s ago 13d 15.2.17 >>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>> 93146564743f 049120e83102 >>>>>>>>> osd.2 >>>>>>>>> ceph2 running (9h) 65s ago 13d 15.2.17 >>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>> 93146564743f 8700e8cefd1f >>>>>>>>> osd.3 >>>>>>>>> ceph2 running (9h) 65s ago 13d 15.2.17 >>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>> 93146564743f 9c71bc87ed16 >>>>>>>>> prometheus.ceph1 >>>>>>>>> ceph1 running (9h) 64s ago 2w 2.18.1 >>>>>>>>> quay.io/prometheus/prometheus:v2.18.1 >>>>>>>>> de242295e225 74a538efd61e >>>>>>>>> >>>>>>>>> On Fri, Sep 2, 2022 at 10:10 AM Adam King <adking@xxxxxxxxxx> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> maybe also a "ceph orch ps --refresh"? It might still have the >>>>>>>>>> old cached daemon inventory from before you remove the files. >>>>>>>>>> >>>>>>>>>> On Fri, Sep 2, 2022 at 9:57 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Adam, >>>>>>>>>>> >>>>>>>>>>> I have deleted file located here - rm >>>>>>>>>>> /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>>>> >>>>>>>>>>> But still getting the same error, do i need to do anything else? >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 2, 2022 at 9:51 AM Adam King <adking@xxxxxxxxxx> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Okay, I'm wondering if this is an issue with version mismatch. >>>>>>>>>>>> Having previously had a 16.2.10 mgr and then now having a 15.2.17 one that >>>>>>>>>>>> doesn't expect this sort of thing to be present. Either way, I'd think just >>>>>>>>>>>> deleting this cephadm.7ce656a8721deb5054c37b0cfb9038 >>>>>>>>>>>> 1522d521dde51fb0c5a2142314d663f63d (and any others like it) file >>>>>>>>>>>> would be the way forward to get orch ls working again. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 2, 2022 at 9:44 AM Satish Patel < >>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Adam, >>>>>>>>>>>>> >>>>>>>>>>>>> In cephadm ls i found the following service but i believe it >>>>>>>>>>>>> was there before also. >>>>>>>>>>>>> >>>>>>>>>>>>> { >>>>>>>>>>>>> "style": "cephadm:v1", >>>>>>>>>>>>> "name": >>>>>>>>>>>>> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d", >>>>>>>>>>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>>>>>>>>>> "systemd_unit": >>>>>>>>>>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>>>>>> ", >>>>>>>>>>>>> "enabled": false, >>>>>>>>>>>>> "state": "stopped", >>>>>>>>>>>>> "container_id": null, >>>>>>>>>>>>> "container_image_name": null, >>>>>>>>>>>>> "container_image_id": null, >>>>>>>>>>>>> "version": null, >>>>>>>>>>>>> "started": null, >>>>>>>>>>>>> "created": null, >>>>>>>>>>>>> "deployed": null, >>>>>>>>>>>>> "configured": null >>>>>>>>>>>>> }, >>>>>>>>>>>>> >>>>>>>>>>>>> Look like remove didn't work >>>>>>>>>>>>> >>>>>>>>>>>>> root@ceph1:~# ceph orch rm cephadm >>>>>>>>>>>>> Failed to remove service. <cephadm> was not found. >>>>>>>>>>>>> >>>>>>>>>>>>> root@ceph1:~# ceph orch rm >>>>>>>>>>>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>>>>>> Failed to remove service. >>>>>>>>>>>>> <cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d> >>>>>>>>>>>>> was not found. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 2, 2022 at 8:27 AM Adam King <adking@xxxxxxxxxx> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> this looks like an old traceback you would get if you ended >>>>>>>>>>>>>> up with a service type that shouldn't be there somehow. The things I'd >>>>>>>>>>>>>> probably check are that "cephadm ls" on either host definitely doesn't >>>>>>>>>>>>>> report and strange things that aren't actually daemons in your cluster such >>>>>>>>>>>>>> as "cephadm.<hash>". Another thing you could maybe try, as I believe the >>>>>>>>>>>>>> assertion it's giving is for an unknown service type here ("AssertionError: >>>>>>>>>>>>>> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to >>>>>>>>>>>>>> remove whatever it thinks is this "cephadm" service that it has deployed. >>>>>>>>>>>>>> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one >>>>>>>>>>>>>> instead of 15.2.17 (I'm assuming here, but the line numbers in that >>>>>>>>>>>>>> traceback suggest octopus). The 16.2.10 one is just much less likely to >>>>>>>>>>>>>> have a bug that causes something like this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel < >>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Now when I run "ceph orch ps" it works but the following >>>>>>>>>>>>>>> command throws an >>>>>>>>>>>>>>> error. Trying to bring up second mgr using ceph orch apply >>>>>>>>>>>>>>> mgr command but >>>>>>>>>>>>>>> didn't help >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> root@ceph1:/ceph-disk# ceph version >>>>>>>>>>>>>>> ceph version 15.2.17 >>>>>>>>>>>>>>> (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus >>>>>>>>>>>>>>> (stable) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> root@ceph1:/ceph-disk# ceph orch ls >>>>>>>>>>>>>>> Error EINVAL: Traceback (most recent call last): >>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in >>>>>>>>>>>>>>> _handle_command >>>>>>>>>>>>>>> return self.handle_command(inbuf, cmd) >>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", >>>>>>>>>>>>>>> line 140, in >>>>>>>>>>>>>>> handle_command >>>>>>>>>>>>>>> return dispatch[cmd['prefix']].call(self, cmd, inbuf) >>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call >>>>>>>>>>>>>>> return self.func(mgr, **kwargs) >>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", >>>>>>>>>>>>>>> line 102, in >>>>>>>>>>>>>>> <lambda> >>>>>>>>>>>>>>> wrapper_copy = lambda *l_args, **l_kwargs: >>>>>>>>>>>>>>> wrapper(*l_args, **l_kwargs) >>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", >>>>>>>>>>>>>>> line 91, in wrapper >>>>>>>>>>>>>>> return func(*args, **kwargs) >>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/module.py", line >>>>>>>>>>>>>>> 503, in >>>>>>>>>>>>>>> _list_services >>>>>>>>>>>>>>> raise_if_exception(completion) >>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", >>>>>>>>>>>>>>> line 642, in >>>>>>>>>>>>>>> raise_if_exception >>>>>>>>>>>>>>> raise e >>>>>>>>>>>>>>> AssertionError: cephadm >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel < >>>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> > nevermind, i found doc related that and i am able to get 1 >>>>>>>>>>>>>>> mgr up - >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel < >>>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> >> Folks, >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> I am having little fun time with cephadm and it's very >>>>>>>>>>>>>>> annoying to deal >>>>>>>>>>>>>>> >> with it >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> I have deployed a ceph cluster using cephadm on two >>>>>>>>>>>>>>> nodes. Now when i was >>>>>>>>>>>>>>> >> trying to upgrade and noticed hiccups where it just >>>>>>>>>>>>>>> upgraded a single mgr >>>>>>>>>>>>>>> >> with 16.2.10 but not other so i started messing around >>>>>>>>>>>>>>> and somehow I >>>>>>>>>>>>>>> >> deleted both mgr in the thought that cephadm will >>>>>>>>>>>>>>> recreate them. >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> Now i don't have any single mgr so my ceph orch command >>>>>>>>>>>>>>> hangs forever and >>>>>>>>>>>>>>> >> looks like a chicken egg issue. >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> How do I recover from this? If I can't run the ceph orch >>>>>>>>>>>>>>> command, I won't >>>>>>>>>>>>>>> >> be able to redeploy my mgr daemons. >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> I am not able to find any mgr in the following command on >>>>>>>>>>>>>>> both nodes. >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> $ cephadm ls | grep mgr >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>>>>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx