Re: [cephadm] mgr: no daemons active

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I can see that in the output but I'm not sure how to get rid of it.

root@ceph1:~# ceph orch ps --refresh
NAME
 HOST   STATUS        REFRESHED  AGE  VERSION    IMAGE NAME
                                                                IMAGE ID
   CONTAINER ID
alertmanager.ceph1
 ceph1  running (9h)  64s ago    2w   0.20.0
quay.io/prometheus/alertmanager:v0.20.0
               0881eb8f169f  ba804b555378
cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
 ceph2  stopped       65s ago    -    <unknown>  <unknown>
                                                                 <unknown>
    <unknown>
crash.ceph1
ceph1  running (9h)  64s ago    2w   15.2.17    quay.io/ceph/ceph:v15
                                                               93146564743f
 a3a431d834fc
crash.ceph2
ceph2  running (9h)  65s ago    13d  15.2.17    quay.io/ceph/ceph:v15
                                                               93146564743f
 3c963693ff2b
grafana.ceph1
ceph1  running (9h)  64s ago    2w   6.7.4
quay.io/ceph/ceph-grafana:6.7.4
               557c83e11646  7583a8dc4c61
mgr.ceph1.smfvfd
 ceph1  running (8h)  64s ago    8h   15.2.17
quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
 93146564743f  1aab837306d2
mon.ceph1
ceph1  running (9h)  64s ago    2w   15.2.17    quay.io/ceph/ceph:v15
                                                               93146564743f
 c1d155d8c7ad
node-exporter.ceph1
ceph1  running (9h)  64s ago    2w   0.18.1
quay.io/prometheus/node-exporter:v0.18.1
                e5a616e4b9cf  2ff235fe0e42
node-exporter.ceph2
ceph2  running (9h)  65s ago    13d  0.18.1
quay.io/prometheus/node-exporter:v0.18.1
                e5a616e4b9cf  17678b9ba602
osd.0
ceph1  running (9h)  64s ago    13d  15.2.17    quay.io/ceph/ceph:v15
                                                               93146564743f
 d0fd73b777a3
osd.1
ceph1  running (9h)  64s ago    13d  15.2.17    quay.io/ceph/ceph:v15
                                                               93146564743f
 049120e83102
osd.2
ceph2  running (9h)  65s ago    13d  15.2.17    quay.io/ceph/ceph:v15
                                                               93146564743f
 8700e8cefd1f
osd.3
ceph2  running (9h)  65s ago    13d  15.2.17    quay.io/ceph/ceph:v15
                                                               93146564743f
 9c71bc87ed16
prometheus.ceph1
 ceph1  running (9h)  64s ago    2w   2.18.1
quay.io/prometheus/prometheus:v2.18.1
               de242295e225  74a538efd61e

On Fri, Sep 2, 2022 at 10:10 AM Adam King <adking@xxxxxxxxxx> wrote:

> maybe also a "ceph orch ps --refresh"? It might still have the old cached
> daemon inventory from before you remove the files.
>
> On Fri, Sep 2, 2022 at 9:57 AM Satish Patel <satish.txt@xxxxxxxxx> wrote:
>
>> Hi Adam,
>>
>> I have deleted file located here - rm
>> /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>
>> But still getting the same error, do i need to do anything else?
>>
>> On Fri, Sep 2, 2022 at 9:51 AM Adam King <adking@xxxxxxxxxx> wrote:
>>
>>> Okay, I'm wondering if this is an issue with version mismatch. Having
>>> previously had a 16.2.10 mgr and then now having a 15.2.17 one that doesn't
>>> expect this sort of thing to be present. Either way, I'd think just
>>> deleting this cephadm.7ce656a8721deb5054c37b0cfb9038
>>> 1522d521dde51fb0c5a2142314d663f63d (and any others like it) file would
>>> be the way forward to get orch ls working again.
>>>
>>> On Fri, Sep 2, 2022 at 9:44 AM Satish Patel <satish.txt@xxxxxxxxx>
>>> wrote:
>>>
>>>> Hi Adam,
>>>>
>>>> In cephadm ls i found the following service but i believe it was there
>>>> before also.
>>>>
>>>> {
>>>>         "style": "cephadm:v1",
>>>>         "name":
>>>> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
>>>>         "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea",
>>>>         "systemd_unit":
>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>>> ",
>>>>         "enabled": false,
>>>>         "state": "stopped",
>>>>         "container_id": null,
>>>>         "container_image_name": null,
>>>>         "container_image_id": null,
>>>>         "version": null,
>>>>         "started": null,
>>>>         "created": null,
>>>>         "deployed": null,
>>>>         "configured": null
>>>>     },
>>>>
>>>> Look like remove didn't work
>>>>
>>>> root@ceph1:~# ceph orch rm cephadm
>>>> Failed to remove service. <cephadm> was not found.
>>>>
>>>> root@ceph1:~# ceph orch rm
>>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>>> Failed to remove service.
>>>> <cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d>
>>>> was not found.
>>>>
>>>> On Fri, Sep 2, 2022 at 8:27 AM Adam King <adking@xxxxxxxxxx> wrote:
>>>>
>>>>> this looks like an old traceback you would get if you ended up with a
>>>>> service type that shouldn't be there somehow. The things I'd probably check
>>>>> are that "cephadm ls" on either host definitely doesn't report and strange
>>>>> things that aren't actually daemons in your cluster such as
>>>>> "cephadm.<hash>". Another thing you could maybe try, as I believe the
>>>>> assertion it's giving is for an unknown service type here ("AssertionError:
>>>>> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to
>>>>> remove whatever it thinks is this "cephadm" service that it has deployed.
>>>>> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one
>>>>> instead of 15.2.17 (I'm assuming here, but the line numbers in that
>>>>> traceback suggest octopus). The 16.2.10 one is just much less likely to
>>>>> have a bug that causes something like this.
>>>>>
>>>>> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>>> wrote:
>>>>>
>>>>>> Now when I run "ceph orch ps" it works but the following command
>>>>>> throws an
>>>>>> error.  Trying to bring up second mgr using ceph orch apply mgr
>>>>>> command but
>>>>>> didn't help
>>>>>>
>>>>>> root@ceph1:/ceph-disk# ceph version
>>>>>> ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4)
>>>>>> octopus
>>>>>> (stable)
>>>>>>
>>>>>> root@ceph1:/ceph-disk# ceph orch ls
>>>>>> Error EINVAL: Traceback (most recent call last):
>>>>>>   File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in
>>>>>> _handle_command
>>>>>>     return self.handle_command(inbuf, cmd)
>>>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in
>>>>>> handle_command
>>>>>>     return dispatch[cmd['prefix']].call(self, cmd, inbuf)
>>>>>>   File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call
>>>>>>     return self.func(mgr, **kwargs)
>>>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, in
>>>>>> <lambda>
>>>>>>     wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args,
>>>>>> **l_kwargs)
>>>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, in
>>>>>> wrapper
>>>>>>     return func(*args, **kwargs)
>>>>>>   File "/usr/share/ceph/mgr/orchestrator/module.py", line 503, in
>>>>>> _list_services
>>>>>>     raise_if_exception(completion)
>>>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, in
>>>>>> raise_if_exception
>>>>>>     raise e
>>>>>> AssertionError: cephadm
>>>>>>
>>>>>> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>>>> wrote:
>>>>>>
>>>>>> > nevermind, i found doc related that and i am able to get 1 mgr up -
>>>>>> >
>>>>>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon
>>>>>> >
>>>>>> >
>>>>>> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>>>> wrote:
>>>>>> >
>>>>>> >> Folks,
>>>>>> >>
>>>>>> >> I am having little fun time with cephadm and it's very annoying to
>>>>>> deal
>>>>>> >> with it
>>>>>> >>
>>>>>> >> I have deployed a ceph cluster using cephadm on two nodes. Now
>>>>>> when i was
>>>>>> >> trying to upgrade and noticed hiccups where it just upgraded a
>>>>>> single mgr
>>>>>> >> with 16.2.10 but not other so i started messing around and somehow
>>>>>> I
>>>>>> >> deleted both mgr in the thought that cephadm will recreate them.
>>>>>> >>
>>>>>> >> Now i don't have any single mgr so my ceph orch command hangs
>>>>>> forever and
>>>>>> >> looks like a chicken egg issue.
>>>>>> >>
>>>>>> >> How do I recover from this? If I can't run the ceph orch command,
>>>>>> I won't
>>>>>> >> be able to redeploy my mgr daemons.
>>>>>> >>
>>>>>> >> I am not able to find any mgr in the following command on both
>>>>>> nodes.
>>>>>> >>
>>>>>> >> $ cephadm ls | grep mgr
>>>>>> >>
>>>>>> >
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>>
>>>>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux