Hi Adam, I have deleted file located here - rm /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d But still getting the same error, do i need to do anything else? On Fri, Sep 2, 2022 at 9:51 AM Adam King <adking@xxxxxxxxxx> wrote: > Okay, I'm wondering if this is an issue with version mismatch. Having > previously had a 16.2.10 mgr and then now having a 15.2.17 one that doesn't > expect this sort of thing to be present. Either way, I'd think just > deleting this cephadm.7ce656a8721deb5054c37b0cfb9038 > 1522d521dde51fb0c5a2142314d663f63d (and any others like it) file would be > the way forward to get orch ls working again. > > On Fri, Sep 2, 2022 at 9:44 AM Satish Patel <satish.txt@xxxxxxxxx> wrote: > >> Hi Adam, >> >> In cephadm ls i found the following service but i believe it was there >> before also. >> >> { >> "style": "cephadm:v1", >> "name": >> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d", >> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea", >> "systemd_unit": >> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >> ", >> "enabled": false, >> "state": "stopped", >> "container_id": null, >> "container_image_name": null, >> "container_image_id": null, >> "version": null, >> "started": null, >> "created": null, >> "deployed": null, >> "configured": null >> }, >> >> Look like remove didn't work >> >> root@ceph1:~# ceph orch rm cephadm >> Failed to remove service. <cephadm> was not found. >> >> root@ceph1:~# ceph orch rm >> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d >> Failed to remove service. >> <cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d> >> was not found. >> >> On Fri, Sep 2, 2022 at 8:27 AM Adam King <adking@xxxxxxxxxx> wrote: >> >>> this looks like an old traceback you would get if you ended up with a >>> service type that shouldn't be there somehow. The things I'd probably check >>> are that "cephadm ls" on either host definitely doesn't report and strange >>> things that aren't actually daemons in your cluster such as >>> "cephadm.<hash>". Another thing you could maybe try, as I believe the >>> assertion it's giving is for an unknown service type here ("AssertionError: >>> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to >>> remove whatever it thinks is this "cephadm" service that it has deployed. >>> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one >>> instead of 15.2.17 (I'm assuming here, but the line numbers in that >>> traceback suggest octopus). The 16.2.10 one is just much less likely to >>> have a bug that causes something like this. >>> >>> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel <satish.txt@xxxxxxxxx> >>> wrote: >>> >>>> Now when I run "ceph orch ps" it works but the following command throws >>>> an >>>> error. Trying to bring up second mgr using ceph orch apply mgr command >>>> but >>>> didn't help >>>> >>>> root@ceph1:/ceph-disk# ceph version >>>> ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus >>>> (stable) >>>> >>>> root@ceph1:/ceph-disk# ceph orch ls >>>> Error EINVAL: Traceback (most recent call last): >>>> File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in >>>> _handle_command >>>> return self.handle_command(inbuf, cmd) >>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in >>>> handle_command >>>> return dispatch[cmd['prefix']].call(self, cmd, inbuf) >>>> File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call >>>> return self.func(mgr, **kwargs) >>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, in >>>> <lambda> >>>> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, >>>> **l_kwargs) >>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, in >>>> wrapper >>>> return func(*args, **kwargs) >>>> File "/usr/share/ceph/mgr/orchestrator/module.py", line 503, in >>>> _list_services >>>> raise_if_exception(completion) >>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, in >>>> raise_if_exception >>>> raise e >>>> AssertionError: cephadm >>>> >>>> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel <satish.txt@xxxxxxxxx> >>>> wrote: >>>> >>>> > nevermind, i found doc related that and i am able to get 1 mgr up - >>>> > >>>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon >>>> > >>>> > >>>> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel <satish.txt@xxxxxxxxx> >>>> wrote: >>>> > >>>> >> Folks, >>>> >> >>>> >> I am having little fun time with cephadm and it's very annoying to >>>> deal >>>> >> with it >>>> >> >>>> >> I have deployed a ceph cluster using cephadm on two nodes. Now when >>>> i was >>>> >> trying to upgrade and noticed hiccups where it just upgraded a >>>> single mgr >>>> >> with 16.2.10 but not other so i started messing around and somehow I >>>> >> deleted both mgr in the thought that cephadm will recreate them. >>>> >> >>>> >> Now i don't have any single mgr so my ceph orch command hangs >>>> forever and >>>> >> looks like a chicken egg issue. >>>> >> >>>> >> How do I recover from this? If I can't run the ceph orch command, I >>>> won't >>>> >> be able to redeploy my mgr daemons. >>>> >> >>>> >> I am not able to find any mgr in the following command on both nodes. >>>> >> >>>> >> $ cephadm ls | grep mgr >>>> >> >>>> > >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> >>>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx