Let's come back to the original question: how to bring back the second mgr? root@ceph1:~# ceph orch apply mgr 2 Scheduled mgr update... Nothing happened with above command, logs saying nothing 2022-09-02T14:16:20.407927+0000 mgr.ceph1.smfvfd (mgr.334626) 16939 : cephadm [INF] refreshing ceph2 facts 2022-09-02T14:16:40.247195+0000 mgr.ceph1.smfvfd (mgr.334626) 16952 : cephadm [INF] Saving service mgr spec with placement count:2 2022-09-02T14:16:53.106919+0000 mgr.ceph1.smfvfd (mgr.334626) 16961 : cephadm [INF] Saving service mgr spec with placement count:2 2022-09-02T14:17:19.135203+0000 mgr.ceph1.smfvfd (mgr.334626) 16975 : cephadm [INF] refreshing ceph1 facts 2022-09-02T14:17:20.780496+0000 mgr.ceph1.smfvfd (mgr.334626) 16977 : cephadm [INF] refreshing ceph2 facts 2022-09-02T14:18:19.502034+0000 mgr.ceph1.smfvfd (mgr.334626) 17008 : cephadm [INF] refreshing ceph1 facts 2022-09-02T14:18:21.127973+0000 mgr.ceph1.smfvfd (mgr.334626) 17010 : cephadm [INF] refreshing ceph2 facts On Fri, Sep 2, 2022 at 10:15 AM Satish Patel <satish.txt@xxxxxxxxx> wrote: > Hi Adam, > > Wait..wait.. now it's working suddenly without doing anything.. very odd > > root@ceph1:~# ceph orch ls > NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME > > IMAGE ID > alertmanager 1/1 5s ago 2w count:1 > quay.io/prometheus/alertmanager:v0.20.0 > 0881eb8f169f > crash 2/2 5s ago 2w * > quay.io/ceph/ceph:v15 > 93146564743f > grafana 1/1 5s ago 2w count:1 > quay.io/ceph/ceph-grafana:6.7.4 > 557c83e11646 > mgr 1/2 5s ago 8h count:2 > quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca > 93146564743f > mon 1/2 5s ago 8h ceph1;ceph2 > quay.io/ceph/ceph:v15 > 93146564743f > node-exporter 2/2 5s ago 2w * > quay.io/prometheus/node-exporter:v0.18.1 > e5a616e4b9cf > osd.osd_spec_default 4/0 5s ago - <unmanaged> > quay.io/ceph/ceph:v15 > 93146564743f > prometheus 1/1 5s ago 2w count:1 > quay.io/prometheus/prometheus:v2.18.1 > > On Fri, Sep 2, 2022 at 10:13 AM Satish Patel <satish.txt@xxxxxxxxx> wrote: > >> I can see that in the output but I'm not sure how to get rid of it. >> >> root@ceph1:~# ceph orch ps --refresh >> NAME >> HOST STATUS REFRESHED AGE VERSION IMAGE NAME >> IMAGE ID >> CONTAINER ID >> alertmanager.ceph1 >> ceph1 running (9h) 64s ago 2w 0.20.0 >> quay.io/prometheus/alertmanager:v0.20.0 >> 0881eb8f169f ba804b555378 >> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >> ceph2 stopped 65s ago - <unknown> <unknown> >> <unknown> >> <unknown> >> crash.ceph1 >> ceph1 running (9h) 64s ago 2w 15.2.17 quay.io/ceph/ceph:v15 >> >> 93146564743f a3a431d834fc >> crash.ceph2 >> ceph2 running (9h) 65s ago 13d 15.2.17 quay.io/ceph/ceph:v15 >> >> 93146564743f 3c963693ff2b >> grafana.ceph1 >> ceph1 running (9h) 64s ago 2w 6.7.4 >> quay.io/ceph/ceph-grafana:6.7.4 >> 557c83e11646 7583a8dc4c61 >> mgr.ceph1.smfvfd >> ceph1 running (8h) 64s ago 8h 15.2.17 >> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca >> 93146564743f 1aab837306d2 >> mon.ceph1 >> ceph1 running (9h) 64s ago 2w 15.2.17 quay.io/ceph/ceph:v15 >> >> 93146564743f c1d155d8c7ad >> node-exporter.ceph1 >> ceph1 running (9h) 64s ago 2w 0.18.1 >> quay.io/prometheus/node-exporter:v0.18.1 >> e5a616e4b9cf 2ff235fe0e42 >> node-exporter.ceph2 >> ceph2 running (9h) 65s ago 13d 0.18.1 >> quay.io/prometheus/node-exporter:v0.18.1 >> e5a616e4b9cf 17678b9ba602 >> osd.0 >> ceph1 running (9h) 64s ago 13d 15.2.17 quay.io/ceph/ceph:v15 >> >> 93146564743f d0fd73b777a3 >> osd.1 >> ceph1 running (9h) 64s ago 13d 15.2.17 quay.io/ceph/ceph:v15 >> >> 93146564743f 049120e83102 >> osd.2 >> ceph2 running (9h) 65s ago 13d 15.2.17 quay.io/ceph/ceph:v15 >> >> 93146564743f 8700e8cefd1f >> osd.3 >> ceph2 running (9h) 65s ago 13d 15.2.17 quay.io/ceph/ceph:v15 >> >> 93146564743f 9c71bc87ed16 >> prometheus.ceph1 >> ceph1 running (9h) 64s ago 2w 2.18.1 >> quay.io/prometheus/prometheus:v2.18.1 >> de242295e225 74a538efd61e >> >> On Fri, Sep 2, 2022 at 10:10 AM Adam King <adking@xxxxxxxxxx> wrote: >> >>> maybe also a "ceph orch ps --refresh"? It might still have the old >>> cached daemon inventory from before you remove the files. >>> >>> On Fri, Sep 2, 2022 at 9:57 AM Satish Patel <satish.txt@xxxxxxxxx> >>> wrote: >>> >>>> Hi Adam, >>>> >>>> I have deleted file located here - rm >>>> /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>> >>>> But still getting the same error, do i need to do anything else? >>>> >>>> On Fri, Sep 2, 2022 at 9:51 AM Adam King <adking@xxxxxxxxxx> wrote: >>>> >>>>> Okay, I'm wondering if this is an issue with version mismatch. Having >>>>> previously had a 16.2.10 mgr and then now having a 15.2.17 one that doesn't >>>>> expect this sort of thing to be present. Either way, I'd think just >>>>> deleting this cephadm.7ce656a8721deb5054c37b0cfb9038 >>>>> 1522d521dde51fb0c5a2142314d663f63d (and any others like it) file >>>>> would be the way forward to get orch ls working again. >>>>> >>>>> On Fri, Sep 2, 2022 at 9:44 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>> wrote: >>>>> >>>>>> Hi Adam, >>>>>> >>>>>> In cephadm ls i found the following service but i believe it was >>>>>> there before also. >>>>>> >>>>>> { >>>>>> "style": "cephadm:v1", >>>>>> "name": >>>>>> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d", >>>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>>> "systemd_unit": >>>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>> ", >>>>>> "enabled": false, >>>>>> "state": "stopped", >>>>>> "container_id": null, >>>>>> "container_image_name": null, >>>>>> "container_image_id": null, >>>>>> "version": null, >>>>>> "started": null, >>>>>> "created": null, >>>>>> "deployed": null, >>>>>> "configured": null >>>>>> }, >>>>>> >>>>>> Look like remove didn't work >>>>>> >>>>>> root@ceph1:~# ceph orch rm cephadm >>>>>> Failed to remove service. <cephadm> was not found. >>>>>> >>>>>> root@ceph1:~# ceph orch rm >>>>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>>> Failed to remove service. >>>>>> <cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d> >>>>>> was not found. >>>>>> >>>>>> On Fri, Sep 2, 2022 at 8:27 AM Adam King <adking@xxxxxxxxxx> wrote: >>>>>> >>>>>>> this looks like an old traceback you would get if you ended up with >>>>>>> a service type that shouldn't be there somehow. The things I'd probably >>>>>>> check are that "cephadm ls" on either host definitely doesn't report and >>>>>>> strange things that aren't actually daemons in your cluster such as >>>>>>> "cephadm.<hash>". Another thing you could maybe try, as I believe the >>>>>>> assertion it's giving is for an unknown service type here ("AssertionError: >>>>>>> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to >>>>>>> remove whatever it thinks is this "cephadm" service that it has deployed. >>>>>>> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one >>>>>>> instead of 15.2.17 (I'm assuming here, but the line numbers in that >>>>>>> traceback suggest octopus). The 16.2.10 one is just much less likely to >>>>>>> have a bug that causes something like this. >>>>>>> >>>>>>> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>> wrote: >>>>>>> >>>>>>>> Now when I run "ceph orch ps" it works but the following command >>>>>>>> throws an >>>>>>>> error. Trying to bring up second mgr using ceph orch apply mgr >>>>>>>> command but >>>>>>>> didn't help >>>>>>>> >>>>>>>> root@ceph1:/ceph-disk# ceph version >>>>>>>> ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) >>>>>>>> octopus >>>>>>>> (stable) >>>>>>>> >>>>>>>> root@ceph1:/ceph-disk# ceph orch ls >>>>>>>> Error EINVAL: Traceback (most recent call last): >>>>>>>> File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in >>>>>>>> _handle_command >>>>>>>> return self.handle_command(inbuf, cmd) >>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, >>>>>>>> in >>>>>>>> handle_command >>>>>>>> return dispatch[cmd['prefix']].call(self, cmd, inbuf) >>>>>>>> File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call >>>>>>>> return self.func(mgr, **kwargs) >>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, >>>>>>>> in >>>>>>>> <lambda> >>>>>>>> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, >>>>>>>> **l_kwargs) >>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, >>>>>>>> in wrapper >>>>>>>> return func(*args, **kwargs) >>>>>>>> File "/usr/share/ceph/mgr/orchestrator/module.py", line 503, in >>>>>>>> _list_services >>>>>>>> raise_if_exception(completion) >>>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, >>>>>>>> in >>>>>>>> raise_if_exception >>>>>>>> raise e >>>>>>>> AssertionError: cephadm >>>>>>>> >>>>>>>> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>>> wrote: >>>>>>>> >>>>>>>> > nevermind, i found doc related that and i am able to get 1 mgr up >>>>>>>> - >>>>>>>> > >>>>>>>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon >>>>>>>> > >>>>>>>> > >>>>>>>> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>>> wrote: >>>>>>>> > >>>>>>>> >> Folks, >>>>>>>> >> >>>>>>>> >> I am having little fun time with cephadm and it's very annoying >>>>>>>> to deal >>>>>>>> >> with it >>>>>>>> >> >>>>>>>> >> I have deployed a ceph cluster using cephadm on two nodes. Now >>>>>>>> when i was >>>>>>>> >> trying to upgrade and noticed hiccups where it just upgraded a >>>>>>>> single mgr >>>>>>>> >> with 16.2.10 but not other so i started messing around and >>>>>>>> somehow I >>>>>>>> >> deleted both mgr in the thought that cephadm will recreate them. >>>>>>>> >> >>>>>>>> >> Now i don't have any single mgr so my ceph orch command hangs >>>>>>>> forever and >>>>>>>> >> looks like a chicken egg issue. >>>>>>>> >> >>>>>>>> >> How do I recover from this? If I can't run the ceph orch >>>>>>>> command, I won't >>>>>>>> >> be able to redeploy my mgr daemons. >>>>>>>> >> >>>>>>>> >> I am not able to find any mgr in the following command on both >>>>>>>> nodes. >>>>>>>> >> >>>>>>>> >> $ cephadm ls | grep mgr >>>>>>>> >> >>>>>>>> > >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>>>> >>>>>>>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx