Re: [cephadm] mgr: no daemons active

Satish Patel <satish.txt@xxxxxxxxx> · Fri, 2 Sep 2022 14:53:10 -0400

Do you think this is because I have only a single MON daemon running?  I
have only two nodes.

On Fri, Sep 2, 2022 at 2:39 PM Satish Patel <satish.txt@xxxxxxxxx> wrote:

> Adam,
>
> I have enabled debug and my logs flood with the following. I am going to
> try some stuff from your provided mailing list and see..
>
> root@ceph1:~# tail -f
> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log
> 2022-09-02T18:38:21.754391+0000 mgr.ceph2.huidoh (mgr.344392) 211198 :
> cephadm [DBG] 0 OSDs are scheduled for removal: []
> 2022-09-02T18:38:21.754519+0000 mgr.ceph2.huidoh (mgr.344392) 211199 :
> cephadm [DBG] Saving [] to store
> 2022-09-02T18:38:21.757155+0000 mgr.ceph2.huidoh (mgr.344392) 211200 :
> cephadm [DBG] refreshing hosts and daemons
> 2022-09-02T18:38:21.758065+0000 mgr.ceph2.huidoh (mgr.344392) 211201 :
> cephadm [DBG] _check_for_strays
> 2022-09-02T18:38:21.758334+0000 mgr.ceph2.huidoh (mgr.344392) 211202 :
> cephadm [DBG] 0 OSDs are scheduled for removal: []
> 2022-09-02T18:38:21.758455+0000 mgr.ceph2.huidoh (mgr.344392) 211203 :
> cephadm [DBG] Saving [] to store
> 2022-09-02T18:38:21.761001+0000 mgr.ceph2.huidoh (mgr.344392) 211204 :
> cephadm [DBG] refreshing hosts and daemons
> 2022-09-02T18:38:21.762092+0000 mgr.ceph2.huidoh (mgr.344392) 211205 :
> cephadm [DBG] _check_for_strays
> 2022-09-02T18:38:21.762357+0000 mgr.ceph2.huidoh (mgr.344392) 211206 :
> cephadm [DBG] 0 OSDs are scheduled for removal: []
> 2022-09-02T18:38:21.762480+0000 mgr.ceph2.huidoh (mgr.344392) 211207 :
> cephadm [DBG] Saving [] to store
>
> On Fri, Sep 2, 2022 at 12:17 PM Adam King <adking@xxxxxxxxxx> wrote:
>
>> hmm, okay. It seems like cephadm is stuck in general rather than an issue
>> specific to the upgrade. I'd first make sure the orchestrator isn't paused
>> (just running "ceph orch resume" should be enough, it's idempotent).
>>
>> Beyond that, there was someone else who had an issue with things getting
>> stuck that was resolved in this thread
>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M
>> <https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M> that
>> might be worth a look.
>>
>> If you haven't already, it's possible stopping the upgrade is a good
>> idea, as maybe that's interfering with it getting to the point where it
>> does the redeploy.
>>
>> If none of those help, it might be worth setting the log level to debug
>> and seeing where things are ending up ("ceph config set mgr
>> mgr/cephadm/log_to_cluster_level debug; ceph orch ps --refresh" then
>> waiting a few minutes before running "ceph log last 100 debug cephadm" (not
>> 100% on format of that command, if it fails try just "ceph log last
>> cephadm"). We could maybe get more info on why it's not performing the
>> redeploy from those debug logs. Just remember to set the log level back
>> after 'ceph config set mgr mgr/cephadm/log_to_cluster_level info' as debug
>> logs are quite verbose.
>>
>> On Fri, Sep 2, 2022 at 11:39 AM Satish Patel <satish.txt@xxxxxxxxx>
>> wrote:
>>
>>> Hi Adam,
>>>
>>> As you said, i did following
>>>
>>> $ ceph orch daemon redeploy mgr.ceph1.smfvfd  quay.io/ceph/ceph:v16.2.10
>>>
>>> Noticed following line in logs but then no activity nothing, still
>>> standby mgr running in older version
>>>
>>> 2022-09-02T15:35:45.753093+0000 mgr.ceph2.huidoh (mgr.344392) 2226 :
>>> cephadm [INF] Schedule redeploy daemon mgr.ceph1.smfvfd
>>> 2022-09-02T15:36:17.279190+0000 mgr.ceph2.huidoh (mgr.344392) 2245 :
>>> cephadm [INF] refreshing ceph2 facts
>>> 2022-09-02T15:36:17.984478+0000 mgr.ceph2.huidoh (mgr.344392) 2246 :
>>> cephadm [INF] refreshing ceph1 facts
>>> 2022-09-02T15:37:17.663730+0000 mgr.ceph2.huidoh (mgr.344392) 2284 :
>>> cephadm [INF] refreshing ceph2 facts
>>> 2022-09-02T15:37:18.386586+0000 mgr.ceph2.huidoh (mgr.344392) 2285 :
>>> cephadm [INF] refreshing ceph1 facts
>>>
>>> I am not seeing any image get downloaded also
>>>
>>> root@ceph1:~# docker image ls
>>> REPOSITORY                         TAG       IMAGE ID       CREATED
>>>     SIZE
>>> quay.io/ceph/ceph                  v15       93146564743f   3 weeks ago
>>>     1.2GB
>>> quay.io/ceph/ceph-grafana          8.3.5     dad864ee21e9   4 months
>>> ago    558MB
>>> quay.io/prometheus/prometheus      v2.33.4   514e6a882f6e   6 months
>>> ago    204MB
>>> quay.io/prometheus/alertmanager    v0.23.0   ba2b418f427c   12 months
>>> ago   57.5MB
>>> quay.io/ceph/ceph-grafana          6.7.4     557c83e11646   13 months
>>> ago   486MB
>>> quay.io/prometheus/prometheus      v2.18.1   de242295e225   2 years ago
>>>     140MB
>>> quay.io/prometheus/alertmanager    v0.20.0   0881eb8f169f   2 years ago
>>>     52.1MB
>>> quay.io/prometheus/node-exporter   v0.18.1   e5a616e4b9cf   3 years ago
>>>     22.9MB
>>>
>>>
>>> On Fri, Sep 2, 2022 at 11:06 AM Adam King <adking@xxxxxxxxxx> wrote:
>>>
>>>> hmm, at this point, maybe we should just try manually upgrading the mgr
>>>> daemons and then move from there. First, just stop the upgrade "ceph orch
>>>> upgrade stop". If you figure out which of the two mgr daemons is the
>>>> standby (it should say which one is active in "ceph -s" output) and then do
>>>> a "ceph orch daemon redeploy <standby-mgr-name>
>>>> quay.io/ceph/ceph:v16.2.10" it should redeploy that specific mgr with
>>>> the new version. You could then do a "ceph mgr fail" to swap which of the
>>>> mgr daemons is active, then do another "ceph orch daemon redeploy
>>>> <standby-mgr-name> quay.io/ceph/ceph:v16.2.10" where the standby is
>>>> now the other mgr still on 15.2.17. Once the mgr daemons are both upgraded
>>>> to the new version, run a "ceph orch redeploy mgr" and then "ceph orch
>>>> upgrade start --image quay.io/ceph/ceph:v16.2.10" and see if it goes
>>>> better.
>>>>
>>>> On Fri, Sep 2, 2022 at 10:36 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>> wrote:
>>>>
>>>>> Hi Adam,
>>>>>
>>>>> I run the following command to upgrade but it looks like nothing is
>>>>> happening
>>>>>
>>>>> $ ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.10
>>>>>
>>>>> Status message is empty..
>>>>>
>>>>> root@ceph1:~# ceph orch upgrade status
>>>>> {
>>>>>     "target_image": "quay.io/ceph/ceph:v16.2.10",
>>>>>     "in_progress": true,
>>>>>     "services_complete": [],
>>>>>     "message": ""
>>>>> }
>>>>>
>>>>> Nothing in Logs
>>>>>
>>>>> root@ceph1:~# tail -f
>>>>> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log
>>>>> 2022-09-02T14:31:52.597661+0000 mgr.ceph2.huidoh (mgr.344392) 174 :
>>>>> cephadm [INF] refreshing ceph2 facts
>>>>> 2022-09-02T14:31:52.991450+0000 mgr.ceph2.huidoh (mgr.344392) 176 :
>>>>> cephadm [INF] refreshing ceph1 facts
>>>>> 2022-09-02T14:32:52.965092+0000 mgr.ceph2.huidoh (mgr.344392) 207 :
>>>>> cephadm [INF] refreshing ceph2 facts
>>>>> 2022-09-02T14:32:53.369789+0000 mgr.ceph2.huidoh (mgr.344392) 208 :
>>>>> cephadm [INF] refreshing ceph1 facts
>>>>> 2022-09-02T14:33:53.367986+0000 mgr.ceph2.huidoh (mgr.344392) 239 :
>>>>> cephadm [INF] refreshing ceph2 facts
>>>>> 2022-09-02T14:33:53.760427+0000 mgr.ceph2.huidoh (mgr.344392) 240 :
>>>>> cephadm [INF] refreshing ceph1 facts
>>>>> 2022-09-02T14:34:53.754277+0000 mgr.ceph2.huidoh (mgr.344392) 272 :
>>>>> cephadm [INF] refreshing ceph2 facts
>>>>> 2022-09-02T14:34:54.162503+0000 mgr.ceph2.huidoh (mgr.344392) 273 :
>>>>> cephadm [INF] refreshing ceph1 facts
>>>>> 2022-09-02T14:35:54.133467+0000 mgr.ceph2.huidoh (mgr.344392) 305 :
>>>>> cephadm [INF] refreshing ceph2 facts
>>>>> 2022-09-02T14:35:54.522171+0000 mgr.ceph2.huidoh (mgr.344392) 306 :
>>>>> cephadm [INF] refreshing ceph1 facts
>>>>>
>>>>> In progress that mesg stuck there for long time
>>>>>
>>>>> root@ceph1:~# ceph -s
>>>>>   cluster:
>>>>>     id:     f270ad9e-1f6f-11ed-b6f8-a539d87379ea
>>>>>     health: HEALTH_OK
>>>>>
>>>>>   services:
>>>>>     mon: 1 daemons, quorum ceph1 (age 9h)
>>>>>     mgr: ceph2.huidoh(active, since 9m), standbys: ceph1.smfvfd
>>>>>     osd: 4 osds: 4 up (since 9h), 4 in (since 11h)
>>>>>
>>>>>   data:
>>>>>     pools:   5 pools, 129 pgs
>>>>>     objects: 20.06k objects, 83 GiB
>>>>>     usage:   168 GiB used, 632 GiB / 800 GiB avail
>>>>>     pgs:     129 active+clean
>>>>>
>>>>>   io:
>>>>>     client:   12 KiB/s wr, 0 op/s rd, 1 op/s wr
>>>>>
>>>>>   progress:
>>>>>     Upgrade to quay.io/ceph/ceph:v16.2.10 (0s)
>>>>>       [............................]
>>>>>
>>>>> On Fri, Sep 2, 2022 at 10:25 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>>> wrote:
>>>>>
>>>>>> It Looks like I did it with the following command.
>>>>>>
>>>>>> $ ceph orch daemon add mgr ceph2:10.73.0.192
>>>>>>
>>>>>> Now i can see two with same version 15.x
>>>>>>
>>>>>> root@ceph1:~# ceph orch ps --daemon-type mgr
>>>>>> NAME              HOST   STATUS         REFRESHED  AGE  VERSION
>>>>>>  IMAGE NAME
>>>>>>                 IMAGE ID      CONTAINER ID
>>>>>> mgr.ceph1.smfvfd  ceph1  running (8h)   41s ago    8h   15.2.17
>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>>>>>>  93146564743f  1aab837306d2
>>>>>> mgr.ceph2.huidoh  ceph2  running (60s)  110s ago   60s  15.2.17
>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>>>>>>  93146564743f  294fd6ab6c97
>>>>>>
>>>>>> On Fri, Sep 2, 2022 at 10:19 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>>>> wrote:
>>>>>>
>>>>>>> Let's come back to the original question: how to bring back the
>>>>>>> second mgr?
>>>>>>>
>>>>>>> root@ceph1:~# ceph orch apply mgr 2
>>>>>>> Scheduled mgr update...
>>>>>>>
>>>>>>> Nothing happened with above command, logs saying nothing
>>>>>>>
>>>>>>> 2022-09-02T14:16:20.407927+0000 mgr.ceph1.smfvfd (mgr.334626) 16939
>>>>>>> : cephadm [INF] refreshing ceph2 facts
>>>>>>> 2022-09-02T14:16:40.247195+0000 mgr.ceph1.smfvfd (mgr.334626) 16952
>>>>>>> : cephadm [INF] Saving service mgr spec with placement count:2
>>>>>>> 2022-09-02T14:16:53.106919+0000 mgr.ceph1.smfvfd (mgr.334626) 16961
>>>>>>> : cephadm [INF] Saving service mgr spec with placement count:2
>>>>>>> 2022-09-02T14:17:19.135203+0000 mgr.ceph1.smfvfd (mgr.334626) 16975
>>>>>>> : cephadm [INF] refreshing ceph1 facts
>>>>>>> 2022-09-02T14:17:20.780496+0000 mgr.ceph1.smfvfd (mgr.334626) 16977
>>>>>>> : cephadm [INF] refreshing ceph2 facts
>>>>>>> 2022-09-02T14:18:19.502034+0000 mgr.ceph1.smfvfd (mgr.334626) 17008
>>>>>>> : cephadm [INF] refreshing ceph1 facts
>>>>>>> 2022-09-02T14:18:21.127973+0000 mgr.ceph1.smfvfd (mgr.334626) 17010
>>>>>>> : cephadm [INF] refreshing ceph2 facts
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 2, 2022 at 10:15 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Adam,
>>>>>>>>
>>>>>>>> Wait..wait.. now it's working suddenly without doing anything..
>>>>>>>> very odd
>>>>>>>>
>>>>>>>> root@ceph1:~# ceph orch ls
>>>>>>>> NAME                  RUNNING  REFRESHED  AGE  PLACEMENT    IMAGE
>>>>>>>> NAME
>>>>>>>>           IMAGE ID
>>>>>>>> alertmanager              1/1  5s ago     2w   count:1
>>>>>>>> quay.io/prometheus/alertmanager:v0.20.0
>>>>>>>>                          0881eb8f169f
>>>>>>>> crash                     2/2  5s ago     2w   *
>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>                          93146564743f
>>>>>>>> grafana                   1/1  5s ago     2w   count:1
>>>>>>>> quay.io/ceph/ceph-grafana:6.7.4
>>>>>>>>                          557c83e11646
>>>>>>>> mgr                       1/2  5s ago     8h   count:2
>>>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>>>>>>>>  93146564743f
>>>>>>>> mon                       1/2  5s ago     8h   ceph1;ceph2
>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>                          93146564743f
>>>>>>>> node-exporter             2/2  5s ago     2w   *
>>>>>>>> quay.io/prometheus/node-exporter:v0.18.1
>>>>>>>>                         e5a616e4b9cf
>>>>>>>> osd.osd_spec_default      4/0  5s ago     -    <unmanaged>
>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>                          93146564743f
>>>>>>>> prometheus                1/1  5s ago     2w   count:1
>>>>>>>> quay.io/prometheus/prometheus:v2.18.1
>>>>>>>>
>>>>>>>> On Fri, Sep 2, 2022 at 10:13 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I can see that in the output but I'm not sure how to get rid of
>>>>>>>>> it.
>>>>>>>>>
>>>>>>>>> root@ceph1:~# ceph orch ps --refresh
>>>>>>>>> NAME
>>>>>>>>>        HOST   STATUS        REFRESHED  AGE  VERSION    IMAGE NAME
>>>>>>>>>                                                                       IMAGE
>>>>>>>>> ID      CONTAINER ID
>>>>>>>>> alertmanager.ceph1
>>>>>>>>>        ceph1  running (9h)  64s ago    2w   0.20.0
>>>>>>>>> quay.io/prometheus/alertmanager:v0.20.0
>>>>>>>>>                          0881eb8f169f  ba804b555378
>>>>>>>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>>>>>>>>  ceph2  stopped       65s ago    -    <unknown>  <unknown>
>>>>>>>>>                                                                  <unknown>
>>>>>>>>>     <unknown>
>>>>>>>>> crash.ceph1
>>>>>>>>>         ceph1  running (9h)  64s ago    2w   15.2.17
>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>                          93146564743f  a3a431d834fc
>>>>>>>>> crash.ceph2
>>>>>>>>>         ceph2  running (9h)  65s ago    13d  15.2.17
>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>                          93146564743f  3c963693ff2b
>>>>>>>>> grafana.ceph1
>>>>>>>>>         ceph1  running (9h)  64s ago    2w   6.7.4
>>>>>>>>> quay.io/ceph/ceph-grafana:6.7.4
>>>>>>>>>                          557c83e11646  7583a8dc4c61
>>>>>>>>> mgr.ceph1.smfvfd
>>>>>>>>>        ceph1  running (8h)  64s ago    8h   15.2.17
>>>>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>>>>>>>>>  93146564743f  1aab837306d2
>>>>>>>>> mon.ceph1
>>>>>>>>>         ceph1  running (9h)  64s ago    2w   15.2.17
>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>                          93146564743f  c1d155d8c7ad
>>>>>>>>> node-exporter.ceph1
>>>>>>>>>         ceph1  running (9h)  64s ago    2w   0.18.1
>>>>>>>>> quay.io/prometheus/node-exporter:v0.18.1
>>>>>>>>>                           e5a616e4b9cf  2ff235fe0e42
>>>>>>>>> node-exporter.ceph2
>>>>>>>>>         ceph2  running (9h)  65s ago    13d  0.18.1
>>>>>>>>> quay.io/prometheus/node-exporter:v0.18.1
>>>>>>>>>                           e5a616e4b9cf  17678b9ba602
>>>>>>>>> osd.0
>>>>>>>>>         ceph1  running (9h)  64s ago    13d  15.2.17
>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>                          93146564743f  d0fd73b777a3
>>>>>>>>> osd.1
>>>>>>>>>         ceph1  running (9h)  64s ago    13d  15.2.17
>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>                          93146564743f  049120e83102
>>>>>>>>> osd.2
>>>>>>>>>         ceph2  running (9h)  65s ago    13d  15.2.17
>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>                          93146564743f  8700e8cefd1f
>>>>>>>>> osd.3
>>>>>>>>>         ceph2  running (9h)  65s ago    13d  15.2.17
>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>                          93146564743f  9c71bc87ed16
>>>>>>>>> prometheus.ceph1
>>>>>>>>>        ceph1  running (9h)  64s ago    2w   2.18.1
>>>>>>>>> quay.io/prometheus/prometheus:v2.18.1
>>>>>>>>>                          de242295e225  74a538efd61e
>>>>>>>>>
>>>>>>>>> On Fri, Sep 2, 2022 at 10:10 AM Adam King <adking@xxxxxxxxxx>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> maybe also a "ceph orch ps --refresh"? It might still have the
>>>>>>>>>> old cached daemon inventory from before you remove the files.
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 2, 2022 at 9:57 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Adam,
>>>>>>>>>>>
>>>>>>>>>>> I have deleted file located here - rm
>>>>>>>>>>> /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>>>>>>>>>>
>>>>>>>>>>> But still getting the same error, do i need to do anything else?
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 2, 2022 at 9:51 AM Adam King <adking@xxxxxxxxxx>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Okay, I'm wondering if this is an issue with version mismatch.
>>>>>>>>>>>> Having previously had a 16.2.10 mgr and then now having a 15.2.17 one that
>>>>>>>>>>>> doesn't expect this sort of thing to be present. Either way, I'd think just
>>>>>>>>>>>> deleting this cephadm.7ce656a8721deb5054c37b0cfb9038
>>>>>>>>>>>> 1522d521dde51fb0c5a2142314d663f63d (and any others like it) file
>>>>>>>>>>>> would be the way forward to get orch ls working again.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Sep 2, 2022 at 9:44 AM Satish Patel <
>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Adam,
>>>>>>>>>>>>>
>>>>>>>>>>>>> In cephadm ls i found the following service but i believe it
>>>>>>>>>>>>> was there before also.
>>>>>>>>>>>>>
>>>>>>>>>>>>> {
>>>>>>>>>>>>>         "style": "cephadm:v1",
>>>>>>>>>>>>>         "name":
>>>>>>>>>>>>> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
>>>>>>>>>>>>>         "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea",
>>>>>>>>>>>>>         "systemd_unit":
>>>>>>>>>>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>>>>>>>>>>>> ",
>>>>>>>>>>>>>         "enabled": false,
>>>>>>>>>>>>>         "state": "stopped",
>>>>>>>>>>>>>         "container_id": null,
>>>>>>>>>>>>>         "container_image_name": null,
>>>>>>>>>>>>>         "container_image_id": null,
>>>>>>>>>>>>>         "version": null,
>>>>>>>>>>>>>         "started": null,
>>>>>>>>>>>>>         "created": null,
>>>>>>>>>>>>>         "deployed": null,
>>>>>>>>>>>>>         "configured": null
>>>>>>>>>>>>>     },
>>>>>>>>>>>>>
>>>>>>>>>>>>> Look like remove didn't work
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@ceph1:~# ceph orch rm cephadm
>>>>>>>>>>>>> Failed to remove service. <cephadm> was not found.
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@ceph1:~# ceph orch rm
>>>>>>>>>>>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>>>>>>>>>>>> Failed to remove service.
>>>>>>>>>>>>> <cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d>
>>>>>>>>>>>>> was not found.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 8:27 AM Adam King <adking@xxxxxxxxxx>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> this looks like an old traceback you would get if you ended
>>>>>>>>>>>>>> up with a service type that shouldn't be there somehow. The things I'd
>>>>>>>>>>>>>> probably check are that "cephadm ls" on either host definitely doesn't
>>>>>>>>>>>>>> report and strange things that aren't actually daemons in your cluster such
>>>>>>>>>>>>>> as "cephadm.<hash>". Another thing you could maybe try, as I believe the
>>>>>>>>>>>>>> assertion it's giving is for an unknown service type here ("AssertionError:
>>>>>>>>>>>>>> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to
>>>>>>>>>>>>>> remove whatever it thinks is this "cephadm" service that it has deployed.
>>>>>>>>>>>>>> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one
>>>>>>>>>>>>>> instead of 15.2.17 (I'm assuming here, but the line numbers in that
>>>>>>>>>>>>>> traceback suggest octopus). The 16.2.10 one is just much less likely to
>>>>>>>>>>>>>> have a bug that causes something like this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel <
>>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now when I run "ceph orch ps" it works but the following
>>>>>>>>>>>>>>> command throws an
>>>>>>>>>>>>>>> error.  Trying to bring up second mgr using ceph orch apply
>>>>>>>>>>>>>>> mgr command but
>>>>>>>>>>>>>>> didn't help
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> root@ceph1:/ceph-disk# ceph version
>>>>>>>>>>>>>>> ceph version 15.2.17
>>>>>>>>>>>>>>> (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus
>>>>>>>>>>>>>>> (stable)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> root@ceph1:/ceph-disk# ceph orch ls
>>>>>>>>>>>>>>> Error EINVAL: Traceback (most recent call last):
>>>>>>>>>>>>>>>   File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in
>>>>>>>>>>>>>>> _handle_command
>>>>>>>>>>>>>>>     return self.handle_command(inbuf, cmd)
>>>>>>>>>>>>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py",
>>>>>>>>>>>>>>> line 140, in
>>>>>>>>>>>>>>> handle_command
>>>>>>>>>>>>>>>     return dispatch[cmd['prefix']].call(self, cmd, inbuf)
>>>>>>>>>>>>>>>   File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call
>>>>>>>>>>>>>>>     return self.func(mgr, **kwargs)
>>>>>>>>>>>>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py",
>>>>>>>>>>>>>>> line 102, in
>>>>>>>>>>>>>>> <lambda>
>>>>>>>>>>>>>>>     wrapper_copy = lambda *l_args, **l_kwargs:
>>>>>>>>>>>>>>> wrapper(*l_args, **l_kwargs)
>>>>>>>>>>>>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py",
>>>>>>>>>>>>>>> line 91, in wrapper
>>>>>>>>>>>>>>>     return func(*args, **kwargs)
>>>>>>>>>>>>>>>   File "/usr/share/ceph/mgr/orchestrator/module.py", line
>>>>>>>>>>>>>>> 503, in
>>>>>>>>>>>>>>> _list_services
>>>>>>>>>>>>>>>     raise_if_exception(completion)
>>>>>>>>>>>>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py",
>>>>>>>>>>>>>>> line 642, in
>>>>>>>>>>>>>>> raise_if_exception
>>>>>>>>>>>>>>>     raise e
>>>>>>>>>>>>>>> AssertionError: cephadm
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel <
>>>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > nevermind, i found doc related that and i am able to get 1
>>>>>>>>>>>>>>> mgr up -
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel <
>>>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >> Folks,
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> I am having little fun time with cephadm and it's very
>>>>>>>>>>>>>>> annoying to deal
>>>>>>>>>>>>>>> >> with it
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> I have deployed a ceph cluster using cephadm on two
>>>>>>>>>>>>>>> nodes. Now when i was
>>>>>>>>>>>>>>> >> trying to upgrade and noticed hiccups where it just
>>>>>>>>>>>>>>> upgraded a single mgr
>>>>>>>>>>>>>>> >> with 16.2.10 but not other so i started messing around
>>>>>>>>>>>>>>> and somehow I
>>>>>>>>>>>>>>> >> deleted both mgr in the thought that cephadm will
>>>>>>>>>>>>>>> recreate them.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Now i don't have any single mgr so my ceph orch command
>>>>>>>>>>>>>>> hangs forever and
>>>>>>>>>>>>>>> >> looks like a chicken egg issue.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> How do I recover from this? If I can't run the ceph orch
>>>>>>>>>>>>>>> command, I won't
>>>>>>>>>>>>>>> >> be able to redeploy my mgr daemons.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> I am not able to find any mgr in the following command on
>>>>>>>>>>>>>>> both nodes.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> $ cephadm ls | grep mgr
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>>>>>>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx