Hi Adam, Wait..wait.. now it's working suddenly without doing anything.. very odd root@ceph1:~# ceph orch ls NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID alertmanager 1/1 5s ago 2w count:1 quay.io/prometheus/alertmanager:v0.20.0 0881eb8f169f crash 2/2 5s ago 2w * quay.io/ceph/ceph:v15 93146564743f grafana 1/1 5s ago 2w count:1 quay.io/ceph/ceph-grafana:6.7.4 557c83e11646 mgr 1/2 5s ago 8h count:2 quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca 93146564743f mon 1/2 5s ago 8h ceph1;ceph2 quay.io/ceph/ceph:v15 93146564743f node-exporter 2/2 5s ago 2w * quay.io/prometheus/node-exporter:v0.18.1 e5a616e4b9cf osd.osd_spec_default 4/0 5s ago - <unmanaged> quay.io/ceph/ceph:v15 93146564743f prometheus 1/1 5s ago 2w count:1 quay.io/prometheus/prometheus:v2.18.1 On Fri, Sep 2, 2022 at 10:13 AM Satish Patel <satish.txt@xxxxxxxxx> wrote: > I can see that in the output but I'm not sure how to get rid of it. > > root@ceph1:~# ceph orch ps --refresh > NAME > HOST STATUS REFRESHED AGE VERSION IMAGE NAME > IMAGE ID > CONTAINER ID > alertmanager.ceph1 > ceph1 running (9h) 64s ago 2w 0.20.0 > quay.io/prometheus/alertmanager:v0.20.0 > 0881eb8f169f ba804b555378 > cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d > ceph2 stopped 65s ago - <unknown> <unknown> > <unknown> > <unknown> > crash.ceph1 > ceph1 running (9h) 64s ago 2w 15.2.17 quay.io/ceph/ceph:v15 > > 93146564743f a3a431d834fc > crash.ceph2 > ceph2 running (9h) 65s ago 13d 15.2.17 quay.io/ceph/ceph:v15 > > 93146564743f 3c963693ff2b > grafana.ceph1 > ceph1 running (9h) 64s ago 2w 6.7.4 > quay.io/ceph/ceph-grafana:6.7.4 > 557c83e11646 7583a8dc4c61 > mgr.ceph1.smfvfd > ceph1 running (8h) 64s ago 8h 15.2.17 > quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca > 93146564743f 1aab837306d2 > mon.ceph1 > ceph1 running (9h) 64s ago 2w 15.2.17 quay.io/ceph/ceph:v15 > > 93146564743f c1d155d8c7ad > node-exporter.ceph1 > ceph1 running (9h) 64s ago 2w 0.18.1 > quay.io/prometheus/node-exporter:v0.18.1 > e5a616e4b9cf 2ff235fe0e42 > node-exporter.ceph2 > ceph2 running (9h) 65s ago 13d 0.18.1 > quay.io/prometheus/node-exporter:v0.18.1 > e5a616e4b9cf 17678b9ba602 > osd.0 > ceph1 running (9h) 64s ago 13d 15.2.17 quay.io/ceph/ceph:v15 > > 93146564743f d0fd73b777a3 > osd.1 > ceph1 running (9h) 64s ago 13d 15.2.17 quay.io/ceph/ceph:v15 > > 93146564743f 049120e83102 > osd.2 > ceph2 running (9h) 65s ago 13d 15.2.17 quay.io/ceph/ceph:v15 > > 93146564743f 8700e8cefd1f > osd.3 > ceph2 running (9h) 65s ago 13d 15.2.17 quay.io/ceph/ceph:v15 > > 93146564743f 9c71bc87ed16 > prometheus.ceph1 > ceph1 running (9h) 64s ago 2w 2.18.1 > quay.io/prometheus/prometheus:v2.18.1 > de242295e225 74a538efd61e > > On Fri, Sep 2, 2022 at 10:10 AM Adam King <adking@xxxxxxxxxx> wrote: > >> maybe also a "ceph orch ps --refresh"? It might still have the old cached >> daemon inventory from before you remove the files. >> >> On Fri, Sep 2, 2022 at 9:57 AM Satish Patel <satish.txt@xxxxxxxxx> wrote: >> >>> Hi Adam, >>> >>> I have deleted file located here - rm >>> /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>> >>> But still getting the same error, do i need to do anything else? >>> >>> On Fri, Sep 2, 2022 at 9:51 AM Adam King <adking@xxxxxxxxxx> wrote: >>> >>>> Okay, I'm wondering if this is an issue with version mismatch. Having >>>> previously had a 16.2.10 mgr and then now having a 15.2.17 one that doesn't >>>> expect this sort of thing to be present. Either way, I'd think just >>>> deleting this cephadm.7ce656a8721deb5054c37b0cfb9038 >>>> 1522d521dde51fb0c5a2142314d663f63d (and any others like it) file would >>>> be the way forward to get orch ls working again. >>>> >>>> On Fri, Sep 2, 2022 at 9:44 AM Satish Patel <satish.txt@xxxxxxxxx> >>>> wrote: >>>> >>>>> Hi Adam, >>>>> >>>>> In cephadm ls i found the following service but i believe it was there >>>>> before also. >>>>> >>>>> { >>>>> "style": "cephadm:v1", >>>>> "name": >>>>> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d", >>>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>>> "systemd_unit": >>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>> ", >>>>> "enabled": false, >>>>> "state": "stopped", >>>>> "container_id": null, >>>>> "container_image_name": null, >>>>> "container_image_id": null, >>>>> "version": null, >>>>> "started": null, >>>>> "created": null, >>>>> "deployed": null, >>>>> "configured": null >>>>> }, >>>>> >>>>> Look like remove didn't work >>>>> >>>>> root@ceph1:~# ceph orch rm cephadm >>>>> Failed to remove service. <cephadm> was not found. >>>>> >>>>> root@ceph1:~# ceph orch rm >>>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>>> Failed to remove service. >>>>> <cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d> >>>>> was not found. >>>>> >>>>> On Fri, Sep 2, 2022 at 8:27 AM Adam King <adking@xxxxxxxxxx> wrote: >>>>> >>>>>> this looks like an old traceback you would get if you ended up with a >>>>>> service type that shouldn't be there somehow. The things I'd probably check >>>>>> are that "cephadm ls" on either host definitely doesn't report and strange >>>>>> things that aren't actually daemons in your cluster such as >>>>>> "cephadm.<hash>". Another thing you could maybe try, as I believe the >>>>>> assertion it's giving is for an unknown service type here ("AssertionError: >>>>>> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to >>>>>> remove whatever it thinks is this "cephadm" service that it has deployed. >>>>>> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one >>>>>> instead of 15.2.17 (I'm assuming here, but the line numbers in that >>>>>> traceback suggest octopus). The 16.2.10 one is just much less likely to >>>>>> have a bug that causes something like this. >>>>>> >>>>>> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>> wrote: >>>>>> >>>>>>> Now when I run "ceph orch ps" it works but the following command >>>>>>> throws an >>>>>>> error. Trying to bring up second mgr using ceph orch apply mgr >>>>>>> command but >>>>>>> didn't help >>>>>>> >>>>>>> root@ceph1:/ceph-disk# ceph version >>>>>>> ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) >>>>>>> octopus >>>>>>> (stable) >>>>>>> >>>>>>> root@ceph1:/ceph-disk# ceph orch ls >>>>>>> Error EINVAL: Traceback (most recent call last): >>>>>>> File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in >>>>>>> _handle_command >>>>>>> return self.handle_command(inbuf, cmd) >>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in >>>>>>> handle_command >>>>>>> return dispatch[cmd['prefix']].call(self, cmd, inbuf) >>>>>>> File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call >>>>>>> return self.func(mgr, **kwargs) >>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, in >>>>>>> <lambda> >>>>>>> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, >>>>>>> **l_kwargs) >>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, in >>>>>>> wrapper >>>>>>> return func(*args, **kwargs) >>>>>>> File "/usr/share/ceph/mgr/orchestrator/module.py", line 503, in >>>>>>> _list_services >>>>>>> raise_if_exception(completion) >>>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, in >>>>>>> raise_if_exception >>>>>>> raise e >>>>>>> AssertionError: cephadm >>>>>>> >>>>>>> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>> wrote: >>>>>>> >>>>>>> > nevermind, i found doc related that and i am able to get 1 mgr up - >>>>>>> > >>>>>>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon >>>>>>> > >>>>>>> > >>>>>>> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>>> wrote: >>>>>>> > >>>>>>> >> Folks, >>>>>>> >> >>>>>>> >> I am having little fun time with cephadm and it's very annoying >>>>>>> to deal >>>>>>> >> with it >>>>>>> >> >>>>>>> >> I have deployed a ceph cluster using cephadm on two nodes. Now >>>>>>> when i was >>>>>>> >> trying to upgrade and noticed hiccups where it just upgraded a >>>>>>> single mgr >>>>>>> >> with 16.2.10 but not other so i started messing around and >>>>>>> somehow I >>>>>>> >> deleted both mgr in the thought that cephadm will recreate them. >>>>>>> >> >>>>>>> >> Now i don't have any single mgr so my ceph orch command hangs >>>>>>> forever and >>>>>>> >> looks like a chicken egg issue. >>>>>>> >> >>>>>>> >> How do I recover from this? If I can't run the ceph orch command, >>>>>>> I won't >>>>>>> >> be able to redeploy my mgr daemons. >>>>>>> >> >>>>>>> >> I am not able to find any mgr in the following command on both >>>>>>> nodes. >>>>>>> >> >>>>>>> >> $ cephadm ls | grep mgr >>>>>>> >> >>>>>>> > >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>>> >>>>>>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx