Re: [cephadm] mgr: no daemons active

Adam King <adking@xxxxxxxxxx> · Fri, 2 Sep 2022 09:51:08 -0400

Okay, I'm wondering if this is an issue with version mismatch. Having
previously had a 16.2.10 mgr and then now having a 15.2.17 one that doesn't
expect this sort of thing to be present. Either way, I'd think just
deleting this cephadm.7ce656a8721deb5054c37b0cfb9038
1522d521dde51fb0c5a2142314d663f63d (and any others like it) file would be
the way forward to get orch ls working again.

On Fri, Sep 2, 2022 at 9:44 AM Satish Patel <satish.txt@xxxxxxxxx> wrote:

> Hi Adam,
>
> In cephadm ls i found the following service but i believe it was there
> before also.
>
> {
>         "style": "cephadm:v1",
>         "name":
> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
>         "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea",
>         "systemd_unit":
> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
> ",
>         "enabled": false,
>         "state": "stopped",
>         "container_id": null,
>         "container_image_name": null,
>         "container_image_id": null,
>         "version": null,
>         "started": null,
>         "created": null,
>         "deployed": null,
>         "configured": null
>     },
>
> Look like remove didn't work
>
> root@ceph1:~# ceph orch rm cephadm
> Failed to remove service. <cephadm> was not found.
>
> root@ceph1:~# ceph orch rm
> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
> Failed to remove service.
> <cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d>
> was not found.
>
> On Fri, Sep 2, 2022 at 8:27 AM Adam King <adking@xxxxxxxxxx> wrote:
>
>> this looks like an old traceback you would get if you ended up with a
>> service type that shouldn't be there somehow. The things I'd probably check
>> are that "cephadm ls" on either host definitely doesn't report and strange
>> things that aren't actually daemons in your cluster such as
>> "cephadm.<hash>". Another thing you could maybe try, as I believe the
>> assertion it's giving is for an unknown service type here ("AssertionError:
>> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to
>> remove whatever it thinks is this "cephadm" service that it has deployed.
>> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one
>> instead of 15.2.17 (I'm assuming here, but the line numbers in that
>> traceback suggest octopus). The 16.2.10 one is just much less likely to
>> have a bug that causes something like this.
>>
>> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel <satish.txt@xxxxxxxxx> wrote:
>>
>>> Now when I run "ceph orch ps" it works but the following command throws
>>> an
>>> error.  Trying to bring up second mgr using ceph orch apply mgr command
>>> but
>>> didn't help
>>>
>>> root@ceph1:/ceph-disk# ceph version
>>> ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus
>>> (stable)
>>>
>>> root@ceph1:/ceph-disk# ceph orch ls
>>> Error EINVAL: Traceback (most recent call last):
>>>   File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in _handle_command
>>>     return self.handle_command(inbuf, cmd)
>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in
>>> handle_command
>>>     return dispatch[cmd['prefix']].call(self, cmd, inbuf)
>>>   File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call
>>>     return self.func(mgr, **kwargs)
>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, in
>>> <lambda>
>>>     wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args,
>>> **l_kwargs)
>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, in
>>> wrapper
>>>     return func(*args, **kwargs)
>>>   File "/usr/share/ceph/mgr/orchestrator/module.py", line 503, in
>>> _list_services
>>>     raise_if_exception(completion)
>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, in
>>> raise_if_exception
>>>     raise e
>>> AssertionError: cephadm
>>>
>>> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel <satish.txt@xxxxxxxxx>
>>> wrote:
>>>
>>> > nevermind, i found doc related that and i am able to get 1 mgr up -
>>> >
>>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon
>>> >
>>> >
>>> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel <satish.txt@xxxxxxxxx>
>>> wrote:
>>> >
>>> >> Folks,
>>> >>
>>> >> I am having little fun time with cephadm and it's very annoying to
>>> deal
>>> >> with it
>>> >>
>>> >> I have deployed a ceph cluster using cephadm on two nodes. Now when i
>>> was
>>> >> trying to upgrade and noticed hiccups where it just upgraded a single
>>> mgr
>>> >> with 16.2.10 but not other so i started messing around and somehow I
>>> >> deleted both mgr in the thought that cephadm will recreate them.
>>> >>
>>> >> Now i don't have any single mgr so my ceph orch command hangs forever
>>> and
>>> >> looks like a chicken egg issue.
>>> >>
>>> >> How do I recover from this? If I can't run the ceph orch command, I
>>> won't
>>> >> be able to redeploy my mgr daemons.
>>> >>
>>> >> I am not able to find any mgr in the following command on both nodes.
>>> >>
>>> >> $ cephadm ls | grep mgr
>>> >>
>>> >
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>
>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx