I can see that in the output but I'm not sure how to get rid of it. root@ceph1:~# ceph orch ps --refresh NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID alertmanager.ceph1 ceph1 running (9h) 64s ago 2w 0.20.0 quay.io/prometheus/alertmanager:v0.20.0 0881eb8f169f ba804b555378 cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d ceph2 stopped 65s ago - <unknown> <unknown> <unknown> <unknown> crash.ceph1 ceph1 running (9h) 64s ago 2w 15.2.17 quay.io/ceph/ceph:v15 93146564743f a3a431d834fc crash.ceph2 ceph2 running (9h) 65s ago 13d 15.2.17 quay.io/ceph/ceph:v15 93146564743f 3c963693ff2b grafana.ceph1 ceph1 running (9h) 64s ago 2w 6.7.4 quay.io/ceph/ceph-grafana:6.7.4 557c83e11646 7583a8dc4c61 mgr.ceph1.smfvfd ceph1 running (8h) 64s ago 8h 15.2.17 quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca 93146564743f 1aab837306d2 mon.ceph1 ceph1 running (9h) 64s ago 2w 15.2.17 quay.io/ceph/ceph:v15 93146564743f c1d155d8c7ad node-exporter.ceph1 ceph1 running (9h) 64s ago 2w 0.18.1 quay.io/prometheus/node-exporter:v0.18.1 e5a616e4b9cf 2ff235fe0e42 node-exporter.ceph2 ceph2 running (9h) 65s ago 13d 0.18.1 quay.io/prometheus/node-exporter:v0.18.1 e5a616e4b9cf 17678b9ba602 osd.0 ceph1 running (9h) 64s ago 13d 15.2.17 quay.io/ceph/ceph:v15 93146564743f d0fd73b777a3 osd.1 ceph1 running (9h) 64s ago 13d 15.2.17 quay.io/ceph/ceph:v15 93146564743f 049120e83102 osd.2 ceph2 running (9h) 65s ago 13d 15.2.17 quay.io/ceph/ceph:v15 93146564743f 8700e8cefd1f osd.3 ceph2 running (9h) 65s ago 13d 15.2.17 quay.io/ceph/ceph:v15 93146564743f 9c71bc87ed16 prometheus.ceph1 ceph1 running (9h) 64s ago 2w 2.18.1 quay.io/prometheus/prometheus:v2.18.1 de242295e225 74a538efd61e On Fri, Sep 2, 2022 at 10:10 AM Adam King <adking@xxxxxxxxxx> wrote: > maybe also a "ceph orch ps --refresh"? It might still have the old cached > daemon inventory from before you remove the files. > > On Fri, Sep 2, 2022 at 9:57 AM Satish Patel <satish.txt@xxxxxxxxx> wrote: > >> Hi Adam, >> >> I have deleted file located here - rm >> /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >> >> But still getting the same error, do i need to do anything else? >> >> On Fri, Sep 2, 2022 at 9:51 AM Adam King <adking@xxxxxxxxxx> wrote: >> >>> Okay, I'm wondering if this is an issue with version mismatch. Having >>> previously had a 16.2.10 mgr and then now having a 15.2.17 one that doesn't >>> expect this sort of thing to be present. Either way, I'd think just >>> deleting this cephadm.7ce656a8721deb5054c37b0cfb9038 >>> 1522d521dde51fb0c5a2142314d663f63d (and any others like it) file would >>> be the way forward to get orch ls working again. >>> >>> On Fri, Sep 2, 2022 at 9:44 AM Satish Patel <satish.txt@xxxxxxxxx> >>> wrote: >>> >>>> Hi Adam, >>>> >>>> In cephadm ls i found the following service but i believe it was there >>>> before also. >>>> >>>> { >>>> "style": "cephadm:v1", >>>> "name": >>>> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d", >>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >>>> "systemd_unit": >>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>> ", >>>> "enabled": false, >>>> "state": "stopped", >>>> "container_id": null, >>>> "container_image_name": null, >>>> "container_image_id": null, >>>> "version": null, >>>> "started": null, >>>> "created": null, >>>> "deployed": null, >>>> "configured": null >>>> }, >>>> >>>> Look like remove didn't work >>>> >>>> root@ceph1:~# ceph orch rm cephadm >>>> Failed to remove service. <cephadm> was not found. >>>> >>>> root@ceph1:~# ceph orch rm >>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >>>> Failed to remove service. >>>> <cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d> >>>> was not found. >>>> >>>> On Fri, Sep 2, 2022 at 8:27 AM Adam King <adking@xxxxxxxxxx> wrote: >>>> >>>>> this looks like an old traceback you would get if you ended up with a >>>>> service type that shouldn't be there somehow. The things I'd probably check >>>>> are that "cephadm ls" on either host definitely doesn't report and strange >>>>> things that aren't actually daemons in your cluster such as >>>>> "cephadm.<hash>". Another thing you could maybe try, as I believe the >>>>> assertion it's giving is for an unknown service type here ("AssertionError: >>>>> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to >>>>> remove whatever it thinks is this "cephadm" service that it has deployed. >>>>> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one >>>>> instead of 15.2.17 (I'm assuming here, but the line numbers in that >>>>> traceback suggest octopus). The 16.2.10 one is just much less likely to >>>>> have a bug that causes something like this. >>>>> >>>>> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>> wrote: >>>>> >>>>>> Now when I run "ceph orch ps" it works but the following command >>>>>> throws an >>>>>> error. Trying to bring up second mgr using ceph orch apply mgr >>>>>> command but >>>>>> didn't help >>>>>> >>>>>> root@ceph1:/ceph-disk# ceph version >>>>>> ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) >>>>>> octopus >>>>>> (stable) >>>>>> >>>>>> root@ceph1:/ceph-disk# ceph orch ls >>>>>> Error EINVAL: Traceback (most recent call last): >>>>>> File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in >>>>>> _handle_command >>>>>> return self.handle_command(inbuf, cmd) >>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in >>>>>> handle_command >>>>>> return dispatch[cmd['prefix']].call(self, cmd, inbuf) >>>>>> File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call >>>>>> return self.func(mgr, **kwargs) >>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, in >>>>>> <lambda> >>>>>> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, >>>>>> **l_kwargs) >>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, in >>>>>> wrapper >>>>>> return func(*args, **kwargs) >>>>>> File "/usr/share/ceph/mgr/orchestrator/module.py", line 503, in >>>>>> _list_services >>>>>> raise_if_exception(completion) >>>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, in >>>>>> raise_if_exception >>>>>> raise e >>>>>> AssertionError: cephadm >>>>>> >>>>>> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>> wrote: >>>>>> >>>>>> > nevermind, i found doc related that and i am able to get 1 mgr up - >>>>>> > >>>>>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon >>>>>> > >>>>>> > >>>>>> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel <satish.txt@xxxxxxxxx> >>>>>> wrote: >>>>>> > >>>>>> >> Folks, >>>>>> >> >>>>>> >> I am having little fun time with cephadm and it's very annoying to >>>>>> deal >>>>>> >> with it >>>>>> >> >>>>>> >> I have deployed a ceph cluster using cephadm on two nodes. Now >>>>>> when i was >>>>>> >> trying to upgrade and noticed hiccups where it just upgraded a >>>>>> single mgr >>>>>> >> with 16.2.10 but not other so i started messing around and somehow >>>>>> I >>>>>> >> deleted both mgr in the thought that cephadm will recreate them. >>>>>> >> >>>>>> >> Now i don't have any single mgr so my ceph orch command hangs >>>>>> forever and >>>>>> >> looks like a chicken egg issue. >>>>>> >> >>>>>> >> How do I recover from this? If I can't run the ceph orch command, >>>>>> I won't >>>>>> >> be able to redeploy my mgr daemons. >>>>>> >> >>>>>> >> I am not able to find any mgr in the following command on both >>>>>> nodes. >>>>>> >> >>>>>> >> $ cephadm ls | grep mgr >>>>>> >> >>>>>> > >>>>>> _______________________________________________ >>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>> >>>>>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx