Yes, i have stopped upgrade and those log before upgrade On Fri, Sep 2, 2022 at 3:27 PM Adam King <adking@xxxxxxxxxx> wrote: > I don't think the number of mons should have any effect on this. Looking > at your logs, the interesting thing is that all the messages are so close > together. Was this before having stopped the upgrade? > > On Fri, Sep 2, 2022 at 2:53 PM Satish Patel <satish.txt@xxxxxxxxx> wrote: > >> Do you think this is because I have only a single MON daemon running? I >> have only two nodes. >> >> On Fri, Sep 2, 2022 at 2:39 PM Satish Patel <satish.txt@xxxxxxxxx> wrote: >> >>> Adam, >>> >>> I have enabled debug and my logs flood with the following. I am going to >>> try some stuff from your provided mailing list and see.. >>> >>> root@ceph1:~# tail -f >>> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log >>> 2022-09-02T18:38:21.754391+0000 mgr.ceph2.huidoh (mgr.344392) 211198 : >>> cephadm [DBG] 0 OSDs are scheduled for removal: [] >>> 2022-09-02T18:38:21.754519+0000 mgr.ceph2.huidoh (mgr.344392) 211199 : >>> cephadm [DBG] Saving [] to store >>> 2022-09-02T18:38:21.757155+0000 mgr.ceph2.huidoh (mgr.344392) 211200 : >>> cephadm [DBG] refreshing hosts and daemons >>> 2022-09-02T18:38:21.758065+0000 mgr.ceph2.huidoh (mgr.344392) 211201 : >>> cephadm [DBG] _check_for_strays >>> 2022-09-02T18:38:21.758334+0000 mgr.ceph2.huidoh (mgr.344392) 211202 : >>> cephadm [DBG] 0 OSDs are scheduled for removal: [] >>> 2022-09-02T18:38:21.758455+0000 mgr.ceph2.huidoh (mgr.344392) 211203 : >>> cephadm [DBG] Saving [] to store >>> 2022-09-02T18:38:21.761001+0000 mgr.ceph2.huidoh (mgr.344392) 211204 : >>> cephadm [DBG] refreshing hosts and daemons >>> 2022-09-02T18:38:21.762092+0000 mgr.ceph2.huidoh (mgr.344392) 211205 : >>> cephadm [DBG] _check_for_strays >>> 2022-09-02T18:38:21.762357+0000 mgr.ceph2.huidoh (mgr.344392) 211206 : >>> cephadm [DBG] 0 OSDs are scheduled for removal: [] >>> 2022-09-02T18:38:21.762480+0000 mgr.ceph2.huidoh (mgr.344392) 211207 : >>> cephadm [DBG] Saving [] to store >>> >>> On Fri, Sep 2, 2022 at 12:17 PM Adam King <adking@xxxxxxxxxx> wrote: >>> >>>> hmm, okay. It seems like cephadm is stuck in general rather than an >>>> issue specific to the upgrade. I'd first make sure the orchestrator isn't >>>> paused (just running "ceph orch resume" should be enough, it's idempotent). >>>> >>>> Beyond that, there was someone else who had an issue with things >>>> getting stuck that was resolved in this thread >>>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M >>>> <https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M> that >>>> might be worth a look. >>>> >>>> If you haven't already, it's possible stopping the upgrade is a good >>>> idea, as maybe that's interfering with it getting to the point where it >>>> does the redeploy. >>>> >>>> If none of those help, it might be worth setting the log level to debug >>>> and seeing where things are ending up ("ceph config set mgr >>>> mgr/cephadm/log_to_cluster_level debug; ceph orch ps --refresh" then >>>> waiting a few minutes before running "ceph log last 100 debug cephadm" (not >>>> 100% on format of that command, if it fails try just "ceph log last >>>> cephadm"). We could maybe get more info on why it's not performing the >>>> redeploy from those debug logs. Just remember to set the log level back >>>> after 'ceph config set mgr mgr/cephadm/log_to_cluster_level info' as debug >>>> logs are quite verbose. >>>> >>>> On Fri, Sep 2, 2022 at 11:39 AM Satish Patel <satish.txt@xxxxxxxxx> >>>> wrote: >>>> >>>>> Hi Adam, >>>>> >>>>> As you said, i did following >>>>> >>>>> $ ceph orch daemon redeploy mgr.ceph1.smfvfd >>>>> quay.io/ceph/ceph:v16.2.10 >>>>> >>>>> Noticed following line in logs but then no activity nothing, still >>>>> standby mgr running in older version >>>>> >>>>> 2022-09-02T15:35:45.753093+0000 mgr.ceph2.huidoh (mgr.344392) 2226 : >>>>> cephadm [INF] Schedule redeploy daemon mgr.ceph1.smfvfd >>>>> 2022-09-02T15:36:17.279190+0000 mgr.ceph2.huidoh (mgr.344392) 2245 : >>>>> cephadm [INF] refreshing ceph2 facts >>>>> 2022-09-02T15:36:17.984478+0000 mgr.ceph2.huidoh (mgr.344392) 2246 : >>>>> cephadm [INF] refreshing ceph1 facts >>>>> 2022-09-02T15:37:17.663730+0000 mgr.ceph2.huidoh (mgr.344392) 2284 : >>>>> cephadm [INF] refreshing ceph2 facts >>>>> 2022-09-02T15:37:18.386586+0000 mgr.ceph2.huidoh (mgr.344392) 2285 : >>>>> cephadm [INF] refreshing ceph1 facts >>>>> >>>>> I am not seeing any image get downloaded also >>>>> >>>>> root@ceph1:~# docker image ls >>>>> REPOSITORY TAG IMAGE ID CREATED >>>>> SIZE >>>>> quay.io/ceph/ceph v15 93146564743f 3 weeks >>>>> ago 1.2GB >>>>> quay.io/ceph/ceph-grafana 8.3.5 dad864ee21e9 4 months >>>>> ago 558MB >>>>> quay.io/prometheus/prometheus v2.33.4 514e6a882f6e 6 months >>>>> ago 204MB >>>>> quay.io/prometheus/alertmanager v0.23.0 ba2b418f427c 12 months >>>>> ago 57.5MB >>>>> quay.io/ceph/ceph-grafana 6.7.4 557c83e11646 13 months >>>>> ago 486MB >>>>> quay.io/prometheus/prometheus v2.18.1 de242295e225 2 years >>>>> ago 140MB >>>>> quay.io/prometheus/alertmanager v0.20.0 0881eb8f169f 2 years >>>>> ago 52.1MB >>>>> quay.io/prometheus/node-exporter v0.18.1 e5a616e4b9cf 3 years >>>>> ago 22.9MB >>>>> >>>>> >>>>> On Fri, Sep 2, 2022 at 11:06 AM Adam King <adking@xxxxxxxxxx> wrote: >>>>> >>>>>> hmm, at this point, maybe we should just try manually upgrading the >>>>>> mgr daemons and then move from there. First, just stop the upgrade "ceph >>>>>> orch upgrade stop". If you figure out which of the two mgr daemons is the >>>>>> standby (it should say which one is active in "ceph -s" output) and then do >>>>>> a "ceph orch daemon redeploy <standby-mgr-name> >>>>>> quay.io/ceph/ceph:v16.2.10" it should redeploy that specific mgr >>>>>> with the new version. You could then do a "ceph mgr fail" to swap which of >>>>>> the mgr daemons is active, then do another "ceph orch daemon redeploy >>>>>> <standby-mgr-name> quay.io/ceph/ceph:v16.2.10" where the standby is >>>>>> now the other mgr still on 15.2.17. Once the mgr daemons are both upgraded >>>>>> to the new version, run a "ceph orch redeploy mgr" and then "ceph orch >>>>>> upgrade start --image quay.io/ceph/ceph:v16.2.10" and see if it goes >>>>>> better. >>>>>> >>>>>> On Fri, Sep 2, 2022 at 10:36 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>> wrote: >>>>>> >>>>>>> Hi Adam, >>>>>>> >>>>>>> I run the following command to upgrade but it looks like nothing is >>>>>>> happening >>>>>>> >>>>>>> $ ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.10 >>>>>>> >>>>>>> Status message is empty.. >>>>>>> >>>>>>> root@ceph1:~# ceph orch upgrade status >>>>>>> { >>>>>>> "target_image": "quay.io/ceph/ceph:v16.2.10", >>>>>>> "in_progress": true, >>>>>>> "services_complete": [], >>>>>>> "message": "" >>>>>>> } >>>>>>> >>>>>>> Nothing in Logs >>>>>>> >>>>>>> root@ceph1:~# tail -f >>>>>>> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log >>>>>>> 2022-09-02T14:31:52.597661+0000 mgr.ceph2.huidoh (mgr.344392) 174 : >>>>>>> cephadm [INF] refreshing ceph2 facts >>>>>>> 2022-09-02T14:31:52.991450+0000 mgr.ceph2.huidoh (mgr.344392) 176 : >>>>>>> cephadm [INF] refreshing ceph1 facts >>>>>>> 2022-09-02T14:32:52.965092+0000 mgr.ceph2.huidoh (mgr.344392) 207 : >>>>>>> cephadm [INF] refreshing ceph2 facts >>>>>>> 2022-09-02T14:32:53.369789+0000 mgr.ceph2.huidoh (mgr.344392) 208 : >>>>>>> cephadm [INF] refreshing ceph1 facts >>>>>>> 2022-09-02T14:33:53.367986+0000 mgr.ceph2.huidoh (mgr.344392) 239 : >>>>>>> cephadm [INF] refreshing ceph2 facts >>>>>>> 2022-09-02T14:33:53.760427+0000 mgr.ceph2.huidoh (mgr.344392) 240 : >>>>>>> cephadm [INF] refreshing ceph1 facts >>>>>>> 2022-09-02T14:34:53.754277+0000 mgr.ceph2.huidoh (mgr.344392) 272 : >>>>>>> cephadm [INF] refreshing ceph2 facts >>>>>>> 2022-09-02T14:34:54.162503+0000 mgr.ceph2.huidoh (mgr.344392) 273 : >>>>>>> cephadm [INF] refreshing ceph1 facts >>>>>>> 2022-09-02T14:35:54.133467+0000 mgr.ceph2.huidoh (mgr.344392) 305 : >>>>>>> cephadm [INF] refreshing ceph2 facts >>>>>>> 2022-09-02T14:35:54.522171+0000 mgr.ceph2.huidoh (mgr.344392) 306 : >>>>>>> cephadm [INF] refreshing ceph1 facts >>>>>>> >>>>>>> In progress that mesg stuck there for long time >>>>>>> >>>>>>> root@ceph1:~# ceph -s >>>>>>> cluster: >>>>>>> id: f270ad9e-1f6f-11ed-b6f8-a539d87379ea >>>>>>> health: HEALTH_OK >>>>>>> >>>>>>> services: >>>>>>> mon: 1 daemons, quorum ceph1 (age 9h) >>>>>>> mgr: ceph2.huidoh(active, since 9m), standbys: ceph1.smfvfd >>>>>>> osd: 4 osds: 4 up (since 9h), 4 in (since 11h) >>>>>>> >>>>>>> data: >>>>>>> pools: 5 pools, 129 pgs >>>>>>> objects: 20.06k objects, 83 GiB >>>>>>> usage: 168 GiB used, 632 GiB / 800 GiB avail >>>>>>> pgs: 129 active+clean >>>>>>> >>>>>>> io: >>>>>>> client: 12 KiB/s wr, 0 op/s rd, 1 op/s wr >>>>>>> >>>>>>> progress: >>>>>>> Upgrade to quay.io/ceph/ceph:v16.2.10 (0s) >>>>>>> [............................] >>>>>>> >>>>>>> On Fri, Sep 2, 2022 at 10:25 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>> wrote: >>>>>>> >>>>>>>> It Looks like I did it with the following command. >>>>>>>> >>>>>>>> $ ceph orch daemon add mgr ceph2:10.73.0.192 >>>>>>>> >>>>>>>> Now i can see two with same version 15.x >>>>>>>> >>>>>>>> root@ceph1:~# ceph orch ps --daemon-type mgr >>>>>>>> NAME HOST STATUS REFRESHED AGE VERSION >>>>>>>> IMAGE NAME >>>>>>>> IMAGE ID CONTAINER ID >>>>>>>> mgr.ceph1.smfvfd ceph1 running (8h) 41s ago 8h 15.2.17 >>>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>>>>> 93146564743f 1aab837306d2 >>>>>>>> mgr.ceph2.huidoh ceph2 running (60s) 110s ago 60s 15.2.17 >>>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>>>>> 93146564743f 294fd6ab6c97 >>>>>>>> >>>>>>>> On Fri, Sep 2, 2022 at 10:19 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Let's come back to the original question: how to bring back the >>>>>>>>> second mgr? >>>>>>>>> >>>>>>>>> root@ceph1:~# ceph orch apply mgr 2 >>>>>>>>> Scheduled mgr update... >>>>>>>>> >>>>>>>>> Nothing happened with above command, logs saying nothing >>>>>>>>> >>>>>>>>> 2022-09-02T14:16:20.407927+0000 mgr.ceph1.smfvfd (mgr.334626) >>>>>>>>> 16939 : cephadm [INF] refreshing ceph2 facts >>>>>>>>> 2022-09-02T14:16:40.247195+0000 mgr.ceph1.smfvfd (mgr.334626) >>>>>>>>> 16952 : cephadm [INF] Saving service mgr spec with placement count:2 >>>>>>>>> 2022-09-02T14:16:53.106919+0000 mgr.ceph1.smfvfd (mgr.334626) >>>>>>>>> 16961 : cephadm [INF] Saving service mgr spec with placement count:2 >>>>>>>>> 2022-09-02T14:17:19.135203+0000 mgr.ceph1.smfvfd (mgr.334626) >>>>>>>>> 16975 : cephadm [INF] refreshing ceph1 facts >>>>>>>>> 2022-09-02T14:17:20.780496+0000 mgr.ceph1.smfvfd (mgr.334626) >>>>>>>>> 16977 : cephadm [INF] refreshing ceph2 facts >>>>>>>>> 2022-09-02T14:18:19.502034+0000 mgr.ceph1.smfvfd (mgr.334626) >>>>>>>>> 17008 : cephadm [INF] refreshing ceph1 facts >>>>>>>>> 2022-09-02T14:18:21.127973+0000 mgr.ceph1.smfvfd (mgr.334626) >>>>>>>>> 17010 : cephadm [INF] refreshing ceph2 facts >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Sep 2, 2022 at 10:15 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Adam, >>>>>>>>>> >>>>>>>>>> Wait..wait.. now it's working suddenly without doing anything.. >>>>>>>>>> very odd >>>>>>>>>> >>>>>>>>>> root@ceph1:~# ceph orch ls >>>>>>>>>> NAME RUNNING REFRESHED AGE PLACEMENT IMAGE >>>>>>>>>> NAME >>>>>>>>>> IMAGE ID >>>>>>>>>> alertmanager 1/1 5s ago 2w count:1 >>>>>>>>>> quay.io/prometheus/alertmanager:v0.20.0 >>>>>>>>>> 0881eb8f169f >>>>>>>>>> crash 2/2 5s ago 2w * >>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>> 93146564743f >>>>>>>>>> grafana 1/1 5s ago 2w count:1 >>>>>>>>>> quay.io/ceph/ceph-grafana:6.7.4 >>>>>>>>>> 557c83e11646 >>>>>>>>>> mgr 1/2 5s ago 8h count:2 >>>>>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>>>>>>> 93146564743f >>>>>>>>>> mon 1/2 5s ago 8h ceph1;ceph2 >>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>> 93146564743f >>>>>>>>>> node-exporter 2/2 5s ago 2w * >>>>>>>>>> quay.io/prometheus/node-exporter:v0.18.1 >>>>>>>>>> e5a616e4b9cf >>>>>>>>>> osd.osd_spec_default 4/0 5s ago - <unmanaged> >>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>> 93146564743f >>>>>>>>>> prometheus 1/1 5s ago 2w count:1 >>>>>>>>>> quay.io/prometheus/prometheus:v2.18.1 >>>>>>>>>> >>>>>>>>>> On Fri, Sep 2, 2022 at 10:13 AM Satish Patel < >>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>>> I can see that in the output but I'm not sure how to get rid of >>>>>>>>>>> it. >>>>>>>>>>> >>>>>>>>>>> root@ceph1:~# ceph orch ps --refresh >>>>>>>>>>> NAME >>>>>>>>>>> HOST STATUS REFRESHED AGE VERSION IMAGE NAME >>>>>>>>>>> >>>>>>>>>>> IMAGE ID CONTAINER ID >>>>>>>>>>> alertmanager.ceph1 >>>>>>>>>>> ceph1 running (9h) 64s ago 2w 0.20.0 >>>>>>>>>>> quay.io/prometheus/alertmanager:v0.20.0 >>>>>>>>>>> 0881eb8f169f ba804b555378 >>>>>>>>>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>>>> ceph2 stopped 65s ago - <unknown> <unknown> >>>>>>>>>>> <unknown> >>>>>>>>>>> <unknown> >>>>>>>>>>> crash.ceph1 >>>>>>>>>>> ceph1 running (9h) 64s ago 2w 15.2.17 >>>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>>> 93146564743f a3a431d834fc >>>>>>>>>>> crash.ceph2 >>>>>>>>>>> ceph2 running (9h) 65s ago 13d 15.2.17 >>>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>>> 93146564743f 3c963693ff2b >>>>>>>>>>> grafana.ceph1 >>>>>>>>>>> ceph1 running (9h) 64s ago 2w 6.7.4 >>>>>>>>>>> quay.io/ceph/ceph-grafana:6.7.4 >>>>>>>>>>> 557c83e11646 7583a8dc4c61 >>>>>>>>>>> mgr.ceph1.smfvfd >>>>>>>>>>> ceph1 running (8h) 64s ago 8h 15.2.17 >>>>>>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >>>>>>>>>>> 93146564743f 1aab837306d2 >>>>>>>>>>> mon.ceph1 >>>>>>>>>>> ceph1 running (9h) 64s ago 2w 15.2.17 >>>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>>> 93146564743f c1d155d8c7ad >>>>>>>>>>> node-exporter.ceph1 >>>>>>>>>>> ceph1 running (9h) 64s ago 2w 0.18.1 >>>>>>>>>>> quay.io/prometheus/node-exporter:v0.18.1 >>>>>>>>>>> e5a616e4b9cf 2ff235fe0e42 >>>>>>>>>>> node-exporter.ceph2 >>>>>>>>>>> ceph2 running (9h) 65s ago 13d 0.18.1 >>>>>>>>>>> quay.io/prometheus/node-exporter:v0.18.1 >>>>>>>>>>> e5a616e4b9cf 17678b9ba602 >>>>>>>>>>> osd.0 >>>>>>>>>>> ceph1 running (9h) 64s ago 13d 15.2.17 >>>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>>> 93146564743f d0fd73b777a3 >>>>>>>>>>> osd.1 >>>>>>>>>>> ceph1 running (9h) 64s ago 13d 15.2.17 >>>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>>> 93146564743f 049120e83102 >>>>>>>>>>> osd.2 >>>>>>>>>>> ceph2 running (9h) 65s ago 13d 15.2.17 >>>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>>> 93146564743f 8700e8cefd1f >>>>>>>>>>> osd.3 >>>>>>>>>>> ceph2 running (9h) 65s ago 13d 15.2.17 >>>>>>>>>>> quay.io/ceph/ceph:v15 >>>>>>>>>>> 93146564743f 9c71bc87ed16 >>>>>>>>>>> prometheus.ceph1 >>>>>>>>>>> ceph1 running (9h) 64s ago 2w 2.18.1 >>>>>>>>>>> quay.io/prometheus/prometheus:v2.18.1 >>>>>>>>>>> de242295e225 74a538efd61e >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 2, 2022 at 10:10 AM Adam King <adking@xxxxxxxxxx> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> maybe also a "ceph orch ps --refresh"? It might still have the >>>>>>>>>>>> old cached daemon inventory from before you remove the files. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 2, 2022 at 9:57 AM Satish Patel < >>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Adam, >>>>>>>>>>>>> >>>>>>>>>>>>> I have deleted file located here - rm >>>>>>>>>>>>> /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>>>>>> >>>>>>>>>>>>> But still getting the same error, do i need to do anything >>>>>>>>>>>>> else? >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 2, 2022 at 9:51 AM Adam King <adking@xxxxxxxxxx> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Okay, I'm wondering if this is an issue with version >>>>>>>>>>>>>> mismatch. Having previously had a 16.2.10 mgr and then now having a 15.2.17 >>>>>>>>>>>>>> one that doesn't expect this sort of thing to be present. Either way, I'd >>>>>>>>>>>>>> think just deleting this cephadm. >>>>>>>>>>>>>> 7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>>>>>>> (and any others like it) file would be the way forward to >>>>>>>>>>>>>> get orch ls working again. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 9:44 AM Satish Patel < >>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Adam, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In cephadm ls i found the following service but i believe it >>>>>>>>>>>>>>> was there before also. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> "style": "cephadm:v1", >>>>>>>>>>>>>>> "name": >>>>>>>>>>>>>>> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d", >>>>>>>>>>>>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>>>>>>>>>>>> "systemd_unit": >>>>>>>>>>>>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>>>>>>>> ", >>>>>>>>>>>>>>> "enabled": false, >>>>>>>>>>>>>>> "state": "stopped", >>>>>>>>>>>>>>> "container_id": null, >>>>>>>>>>>>>>> "container_image_name": null, >>>>>>>>>>>>>>> "container_image_id": null, >>>>>>>>>>>>>>> "version": null, >>>>>>>>>>>>>>> "started": null, >>>>>>>>>>>>>>> "created": null, >>>>>>>>>>>>>>> "deployed": null, >>>>>>>>>>>>>>> "configured": null >>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Look like remove didn't work >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> root@ceph1:~# ceph orch rm cephadm >>>>>>>>>>>>>>> Failed to remove service. <cephadm> was not found. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> root@ceph1:~# ceph orch rm >>>>>>>>>>>>>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>>>>>>>>>>> Failed to remove service. >>>>>>>>>>>>>>> <cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d> >>>>>>>>>>>>>>> was not found. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 8:27 AM Adam King <adking@xxxxxxxxxx> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> this looks like an old traceback you would get if you ended >>>>>>>>>>>>>>>> up with a service type that shouldn't be there somehow. The things I'd >>>>>>>>>>>>>>>> probably check are that "cephadm ls" on either host definitely doesn't >>>>>>>>>>>>>>>> report and strange things that aren't actually daemons in your cluster such >>>>>>>>>>>>>>>> as "cephadm.<hash>". Another thing you could maybe try, as I believe the >>>>>>>>>>>>>>>> assertion it's giving is for an unknown service type here ("AssertionError: >>>>>>>>>>>>>>>> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to >>>>>>>>>>>>>>>> remove whatever it thinks is this "cephadm" service that it has deployed. >>>>>>>>>>>>>>>> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one >>>>>>>>>>>>>>>> instead of 15.2.17 (I'm assuming here, but the line numbers in that >>>>>>>>>>>>>>>> traceback suggest octopus). The 16.2.10 one is just much less likely to >>>>>>>>>>>>>>>> have a bug that causes something like this. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel < >>>>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Now when I run "ceph orch ps" it works but the following >>>>>>>>>>>>>>>>> command throws an >>>>>>>>>>>>>>>>> error. Trying to bring up second mgr using ceph orch >>>>>>>>>>>>>>>>> apply mgr command but >>>>>>>>>>>>>>>>> didn't help >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> root@ceph1:/ceph-disk# ceph version >>>>>>>>>>>>>>>>> ceph version 15.2.17 >>>>>>>>>>>>>>>>> (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus >>>>>>>>>>>>>>>>> (stable) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> root@ceph1:/ceph-disk# ceph orch ls >>>>>>>>>>>>>>>>> Error EINVAL: Traceback (most recent call last): >>>>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in >>>>>>>>>>>>>>>>> _handle_command >>>>>>>>>>>>>>>>> return self.handle_command(inbuf, cmd) >>>>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", >>>>>>>>>>>>>>>>> line 140, in >>>>>>>>>>>>>>>>> handle_command >>>>>>>>>>>>>>>>> return dispatch[cmd['prefix']].call(self, cmd, inbuf) >>>>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/mgr_module.py", line 320, in >>>>>>>>>>>>>>>>> call >>>>>>>>>>>>>>>>> return self.func(mgr, **kwargs) >>>>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", >>>>>>>>>>>>>>>>> line 102, in >>>>>>>>>>>>>>>>> <lambda> >>>>>>>>>>>>>>>>> wrapper_copy = lambda *l_args, **l_kwargs: >>>>>>>>>>>>>>>>> wrapper(*l_args, **l_kwargs) >>>>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", >>>>>>>>>>>>>>>>> line 91, in wrapper >>>>>>>>>>>>>>>>> return func(*args, **kwargs) >>>>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/module.py", line >>>>>>>>>>>>>>>>> 503, in >>>>>>>>>>>>>>>>> _list_services >>>>>>>>>>>>>>>>> raise_if_exception(completion) >>>>>>>>>>>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", >>>>>>>>>>>>>>>>> line 642, in >>>>>>>>>>>>>>>>> raise_if_exception >>>>>>>>>>>>>>>>> raise e >>>>>>>>>>>>>>>>> AssertionError: cephadm >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel < >>>>>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> > nevermind, i found doc related that and i am able to get >>>>>>>>>>>>>>>>> 1 mgr up - >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel < >>>>>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote: >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> >> Folks, >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> I am having little fun time with cephadm and it's very >>>>>>>>>>>>>>>>> annoying to deal >>>>>>>>>>>>>>>>> >> with it >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> I have deployed a ceph cluster using cephadm on two >>>>>>>>>>>>>>>>> nodes. Now when i was >>>>>>>>>>>>>>>>> >> trying to upgrade and noticed hiccups where it just >>>>>>>>>>>>>>>>> upgraded a single mgr >>>>>>>>>>>>>>>>> >> with 16.2.10 but not other so i started messing around >>>>>>>>>>>>>>>>> and somehow I >>>>>>>>>>>>>>>>> >> deleted both mgr in the thought that cephadm will >>>>>>>>>>>>>>>>> recreate them. >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> Now i don't have any single mgr so my ceph orch command >>>>>>>>>>>>>>>>> hangs forever and >>>>>>>>>>>>>>>>> >> looks like a chicken egg issue. >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> How do I recover from this? If I can't run the ceph >>>>>>>>>>>>>>>>> orch command, I won't >>>>>>>>>>>>>>>>> >> be able to redeploy my mgr daemons. >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> I am not able to find any mgr in the following command >>>>>>>>>>>>>>>>> on both nodes. >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> $ cephadm ls | grep mgr >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>>>>>>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx