Re: [cephadm] mgr: no daemons active

Satish Patel <satish.txt@xxxxxxxxx> · Fri, 2 Sep 2022 09:57:08 -0400

Hi Adam,

I have deleted file located here - rm
/var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d

But still getting the same error, do i need to do anything else?

On Fri, Sep 2, 2022 at 9:51 AM Adam King <adking@xxxxxxxxxx> wrote:

> Okay, I'm wondering if this is an issue with version mismatch. Having
> previously had a 16.2.10 mgr and then now having a 15.2.17 one that doesn't
> expect this sort of thing to be present. Either way, I'd think just
> deleting this cephadm.7ce656a8721deb5054c37b0cfb9038
> 1522d521dde51fb0c5a2142314d663f63d (and any others like it) file would be
> the way forward to get orch ls working again.
>
> On Fri, Sep 2, 2022 at 9:44 AM Satish Patel <satish.txt@xxxxxxxxx> wrote:
>
>> Hi Adam,
>>
>> In cephadm ls i found the following service but i believe it was there
>> before also.
>>
>> {
>>         "style": "cephadm:v1",
>>         "name":
>> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
>>         "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea",
>>         "systemd_unit":
>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>> ",
>>         "enabled": false,
>>         "state": "stopped",
>>         "container_id": null,
>>         "container_image_name": null,
>>         "container_image_id": null,
>>         "version": null,
>>         "started": null,
>>         "created": null,
>>         "deployed": null,
>>         "configured": null
>>     },
>>
>> Look like remove didn't work
>>
>> root@ceph1:~# ceph orch rm cephadm
>> Failed to remove service. <cephadm> was not found.
>>
>> root@ceph1:~# ceph orch rm
>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>> Failed to remove service.
>> <cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d>
>> was not found.
>>
>> On Fri, Sep 2, 2022 at 8:27 AM Adam King <adking@xxxxxxxxxx> wrote:
>>
>>> this looks like an old traceback you would get if you ended up with a
>>> service type that shouldn't be there somehow. The things I'd probably check
>>> are that "cephadm ls" on either host definitely doesn't report and strange
>>> things that aren't actually daemons in your cluster such as
>>> "cephadm.<hash>". Another thing you could maybe try, as I believe the
>>> assertion it's giving is for an unknown service type here ("AssertionError:
>>> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to
>>> remove whatever it thinks is this "cephadm" service that it has deployed.
>>> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one
>>> instead of 15.2.17 (I'm assuming here, but the line numbers in that
>>> traceback suggest octopus). The 16.2.10 one is just much less likely to
>>> have a bug that causes something like this.
>>>
>>> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel <satish.txt@xxxxxxxxx>
>>> wrote:
>>>
>>>> Now when I run "ceph orch ps" it works but the following command throws
>>>> an
>>>> error.  Trying to bring up second mgr using ceph orch apply mgr command
>>>> but
>>>> didn't help
>>>>
>>>> root@ceph1:/ceph-disk# ceph version
>>>> ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus
>>>> (stable)
>>>>
>>>> root@ceph1:/ceph-disk# ceph orch ls
>>>> Error EINVAL: Traceback (most recent call last):
>>>>   File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in
>>>> _handle_command
>>>>     return self.handle_command(inbuf, cmd)
>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in
>>>> handle_command
>>>>     return dispatch[cmd['prefix']].call(self, cmd, inbuf)
>>>>   File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call
>>>>     return self.func(mgr, **kwargs)
>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, in
>>>> <lambda>
>>>>     wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args,
>>>> **l_kwargs)
>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, in
>>>> wrapper
>>>>     return func(*args, **kwargs)
>>>>   File "/usr/share/ceph/mgr/orchestrator/module.py", line 503, in
>>>> _list_services
>>>>     raise_if_exception(completion)
>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, in
>>>> raise_if_exception
>>>>     raise e
>>>> AssertionError: cephadm
>>>>
>>>> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>> wrote:
>>>>
>>>> > nevermind, i found doc related that and i am able to get 1 mgr up -
>>>> >
>>>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon
>>>> >
>>>> >
>>>> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>> wrote:
>>>> >
>>>> >> Folks,
>>>> >>
>>>> >> I am having little fun time with cephadm and it's very annoying to
>>>> deal
>>>> >> with it
>>>> >>
>>>> >> I have deployed a ceph cluster using cephadm on two nodes. Now when
>>>> i was
>>>> >> trying to upgrade and noticed hiccups where it just upgraded a
>>>> single mgr
>>>> >> with 16.2.10 but not other so i started messing around and somehow I
>>>> >> deleted both mgr in the thought that cephadm will recreate them.
>>>> >>
>>>> >> Now i don't have any single mgr so my ceph orch command hangs
>>>> forever and
>>>> >> looks like a chicken egg issue.
>>>> >>
>>>> >> How do I recover from this? If I can't run the ceph orch command, I
>>>> won't
>>>> >> be able to redeploy my mgr daemons.
>>>> >>
>>>> >> I am not able to find any mgr in the following command on both nodes.
>>>> >>
>>>> >> $ cephadm ls | grep mgr
>>>> >>
>>>> >
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>
>>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx