Re: [cephadm] mgr: no daemons active

Adam King <adking@xxxxxxxxxx> · Fri, 2 Sep 2022 15:26:43 -0400

I don't think the number of mons should have any effect on this. Looking at
your logs, the interesting thing is that all the messages are so close
together. Was this before having stopped the upgrade?

On Fri, Sep 2, 2022 at 2:53 PM Satish Patel <satish.txt@xxxxxxxxx> wrote:

> Do you think this is because I have only a single MON daemon running?  I
> have only two nodes.
>
> On Fri, Sep 2, 2022 at 2:39 PM Satish Patel <satish.txt@xxxxxxxxx> wrote:
>
>> Adam,
>>
>> I have enabled debug and my logs flood with the following. I am going to
>> try some stuff from your provided mailing list and see..
>>
>> root@ceph1:~# tail -f
>> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log
>> 2022-09-02T18:38:21.754391+0000 mgr.ceph2.huidoh (mgr.344392) 211198 :
>> cephadm [DBG] 0 OSDs are scheduled for removal: []
>> 2022-09-02T18:38:21.754519+0000 mgr.ceph2.huidoh (mgr.344392) 211199 :
>> cephadm [DBG] Saving [] to store
>> 2022-09-02T18:38:21.757155+0000 mgr.ceph2.huidoh (mgr.344392) 211200 :
>> cephadm [DBG] refreshing hosts and daemons
>> 2022-09-02T18:38:21.758065+0000 mgr.ceph2.huidoh (mgr.344392) 211201 :
>> cephadm [DBG] _check_for_strays
>> 2022-09-02T18:38:21.758334+0000 mgr.ceph2.huidoh (mgr.344392) 211202 :
>> cephadm [DBG] 0 OSDs are scheduled for removal: []
>> 2022-09-02T18:38:21.758455+0000 mgr.ceph2.huidoh (mgr.344392) 211203 :
>> cephadm [DBG] Saving [] to store
>> 2022-09-02T18:38:21.761001+0000 mgr.ceph2.huidoh (mgr.344392) 211204 :
>> cephadm [DBG] refreshing hosts and daemons
>> 2022-09-02T18:38:21.762092+0000 mgr.ceph2.huidoh (mgr.344392) 211205 :
>> cephadm [DBG] _check_for_strays
>> 2022-09-02T18:38:21.762357+0000 mgr.ceph2.huidoh (mgr.344392) 211206 :
>> cephadm [DBG] 0 OSDs are scheduled for removal: []
>> 2022-09-02T18:38:21.762480+0000 mgr.ceph2.huidoh (mgr.344392) 211207 :
>> cephadm [DBG] Saving [] to store
>>
>> On Fri, Sep 2, 2022 at 12:17 PM Adam King <adking@xxxxxxxxxx> wrote:
>>
>>> hmm, okay. It seems like cephadm is stuck in general rather than an
>>> issue specific to the upgrade. I'd first make sure the orchestrator isn't
>>> paused (just running "ceph orch resume" should be enough, it's idempotent).
>>>
>>> Beyond that, there was someone else who had an issue with things getting
>>> stuck that was resolved in this thread
>>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M
>>> <https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M> that
>>> might be worth a look.
>>>
>>> If you haven't already, it's possible stopping the upgrade is a good
>>> idea, as maybe that's interfering with it getting to the point where it
>>> does the redeploy.
>>>
>>> If none of those help, it might be worth setting the log level to debug
>>> and seeing where things are ending up ("ceph config set mgr
>>> mgr/cephadm/log_to_cluster_level debug; ceph orch ps --refresh" then
>>> waiting a few minutes before running "ceph log last 100 debug cephadm" (not
>>> 100% on format of that command, if it fails try just "ceph log last
>>> cephadm"). We could maybe get more info on why it's not performing the
>>> redeploy from those debug logs. Just remember to set the log level back
>>> after 'ceph config set mgr mgr/cephadm/log_to_cluster_level info' as debug
>>> logs are quite verbose.
>>>
>>> On Fri, Sep 2, 2022 at 11:39 AM Satish Patel <satish.txt@xxxxxxxxx>
>>> wrote:
>>>
>>>> Hi Adam,
>>>>
>>>> As you said, i did following
>>>>
>>>> $ ceph orch daemon redeploy mgr.ceph1.smfvfd
>>>> quay.io/ceph/ceph:v16.2.10
>>>>
>>>> Noticed following line in logs but then no activity nothing, still
>>>> standby mgr running in older version
>>>>
>>>> 2022-09-02T15:35:45.753093+0000 mgr.ceph2.huidoh (mgr.344392) 2226 :
>>>> cephadm [INF] Schedule redeploy daemon mgr.ceph1.smfvfd
>>>> 2022-09-02T15:36:17.279190+0000 mgr.ceph2.huidoh (mgr.344392) 2245 :
>>>> cephadm [INF] refreshing ceph2 facts
>>>> 2022-09-02T15:36:17.984478+0000 mgr.ceph2.huidoh (mgr.344392) 2246 :
>>>> cephadm [INF] refreshing ceph1 facts
>>>> 2022-09-02T15:37:17.663730+0000 mgr.ceph2.huidoh (mgr.344392) 2284 :
>>>> cephadm [INF] refreshing ceph2 facts
>>>> 2022-09-02T15:37:18.386586+0000 mgr.ceph2.huidoh (mgr.344392) 2285 :
>>>> cephadm [INF] refreshing ceph1 facts
>>>>
>>>> I am not seeing any image get downloaded also
>>>>
>>>> root@ceph1:~# docker image ls
>>>> REPOSITORY                         TAG       IMAGE ID       CREATED
>>>>     SIZE
>>>> quay.io/ceph/ceph                  v15       93146564743f   3 weeks
>>>> ago     1.2GB
>>>> quay.io/ceph/ceph-grafana          8.3.5     dad864ee21e9   4 months
>>>> ago    558MB
>>>> quay.io/prometheus/prometheus      v2.33.4   514e6a882f6e   6 months
>>>> ago    204MB
>>>> quay.io/prometheus/alertmanager    v0.23.0   ba2b418f427c   12 months
>>>> ago   57.5MB
>>>> quay.io/ceph/ceph-grafana          6.7.4     557c83e11646   13 months
>>>> ago   486MB
>>>> quay.io/prometheus/prometheus      v2.18.1   de242295e225   2 years
>>>> ago     140MB
>>>> quay.io/prometheus/alertmanager    v0.20.0   0881eb8f169f   2 years
>>>> ago     52.1MB
>>>> quay.io/prometheus/node-exporter   v0.18.1   e5a616e4b9cf   3 years
>>>> ago     22.9MB
>>>>
>>>>
>>>> On Fri, Sep 2, 2022 at 11:06 AM Adam King <adking@xxxxxxxxxx> wrote:
>>>>
>>>>> hmm, at this point, maybe we should just try manually upgrading the
>>>>> mgr daemons and then move from there. First, just stop the upgrade "ceph
>>>>> orch upgrade stop". If you figure out which of the two mgr daemons is the
>>>>> standby (it should say which one is active in "ceph -s" output) and then do
>>>>> a "ceph orch daemon redeploy <standby-mgr-name>
>>>>> quay.io/ceph/ceph:v16.2.10" it should redeploy that specific mgr with
>>>>> the new version. You could then do a "ceph mgr fail" to swap which of the
>>>>> mgr daemons is active, then do another "ceph orch daemon redeploy
>>>>> <standby-mgr-name> quay.io/ceph/ceph:v16.2.10" where the standby is
>>>>> now the other mgr still on 15.2.17. Once the mgr daemons are both upgraded
>>>>> to the new version, run a "ceph orch redeploy mgr" and then "ceph orch
>>>>> upgrade start --image quay.io/ceph/ceph:v16.2.10" and see if it goes
>>>>> better.
>>>>>
>>>>> On Fri, Sep 2, 2022 at 10:36 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>>> wrote:
>>>>>
>>>>>> Hi Adam,
>>>>>>
>>>>>> I run the following command to upgrade but it looks like nothing is
>>>>>> happening
>>>>>>
>>>>>> $ ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.10
>>>>>>
>>>>>> Status message is empty..
>>>>>>
>>>>>> root@ceph1:~# ceph orch upgrade status
>>>>>> {
>>>>>>     "target_image": "quay.io/ceph/ceph:v16.2.10",
>>>>>>     "in_progress": true,
>>>>>>     "services_complete": [],
>>>>>>     "message": ""
>>>>>> }
>>>>>>
>>>>>> Nothing in Logs
>>>>>>
>>>>>> root@ceph1:~# tail -f
>>>>>> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log
>>>>>> 2022-09-02T14:31:52.597661+0000 mgr.ceph2.huidoh (mgr.344392) 174 :
>>>>>> cephadm [INF] refreshing ceph2 facts
>>>>>> 2022-09-02T14:31:52.991450+0000 mgr.ceph2.huidoh (mgr.344392) 176 :
>>>>>> cephadm [INF] refreshing ceph1 facts
>>>>>> 2022-09-02T14:32:52.965092+0000 mgr.ceph2.huidoh (mgr.344392) 207 :
>>>>>> cephadm [INF] refreshing ceph2 facts
>>>>>> 2022-09-02T14:32:53.369789+0000 mgr.ceph2.huidoh (mgr.344392) 208 :
>>>>>> cephadm [INF] refreshing ceph1 facts
>>>>>> 2022-09-02T14:33:53.367986+0000 mgr.ceph2.huidoh (mgr.344392) 239 :
>>>>>> cephadm [INF] refreshing ceph2 facts
>>>>>> 2022-09-02T14:33:53.760427+0000 mgr.ceph2.huidoh (mgr.344392) 240 :
>>>>>> cephadm [INF] refreshing ceph1 facts
>>>>>> 2022-09-02T14:34:53.754277+0000 mgr.ceph2.huidoh (mgr.344392) 272 :
>>>>>> cephadm [INF] refreshing ceph2 facts
>>>>>> 2022-09-02T14:34:54.162503+0000 mgr.ceph2.huidoh (mgr.344392) 273 :
>>>>>> cephadm [INF] refreshing ceph1 facts
>>>>>> 2022-09-02T14:35:54.133467+0000 mgr.ceph2.huidoh (mgr.344392) 305 :
>>>>>> cephadm [INF] refreshing ceph2 facts
>>>>>> 2022-09-02T14:35:54.522171+0000 mgr.ceph2.huidoh (mgr.344392) 306 :
>>>>>> cephadm [INF] refreshing ceph1 facts
>>>>>>
>>>>>> In progress that mesg stuck there for long time
>>>>>>
>>>>>> root@ceph1:~# ceph -s
>>>>>>   cluster:
>>>>>>     id:     f270ad9e-1f6f-11ed-b6f8-a539d87379ea
>>>>>>     health: HEALTH_OK
>>>>>>
>>>>>>   services:
>>>>>>     mon: 1 daemons, quorum ceph1 (age 9h)
>>>>>>     mgr: ceph2.huidoh(active, since 9m), standbys: ceph1.smfvfd
>>>>>>     osd: 4 osds: 4 up (since 9h), 4 in (since 11h)
>>>>>>
>>>>>>   data:
>>>>>>     pools:   5 pools, 129 pgs
>>>>>>     objects: 20.06k objects, 83 GiB
>>>>>>     usage:   168 GiB used, 632 GiB / 800 GiB avail
>>>>>>     pgs:     129 active+clean
>>>>>>
>>>>>>   io:
>>>>>>     client:   12 KiB/s wr, 0 op/s rd, 1 op/s wr
>>>>>>
>>>>>>   progress:
>>>>>>     Upgrade to quay.io/ceph/ceph:v16.2.10 (0s)
>>>>>>       [............................]
>>>>>>
>>>>>> On Fri, Sep 2, 2022 at 10:25 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>>>> wrote:
>>>>>>
>>>>>>> It Looks like I did it with the following command.
>>>>>>>
>>>>>>> $ ceph orch daemon add mgr ceph2:10.73.0.192
>>>>>>>
>>>>>>> Now i can see two with same version 15.x
>>>>>>>
>>>>>>> root@ceph1:~# ceph orch ps --daemon-type mgr
>>>>>>> NAME              HOST   STATUS         REFRESHED  AGE  VERSION
>>>>>>>  IMAGE NAME
>>>>>>>                 IMAGE ID      CONTAINER ID
>>>>>>> mgr.ceph1.smfvfd  ceph1  running (8h)   41s ago    8h   15.2.17
>>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>>>>>>>  93146564743f  1aab837306d2
>>>>>>> mgr.ceph2.huidoh  ceph2  running (60s)  110s ago   60s  15.2.17
>>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>>>>>>>  93146564743f  294fd6ab6c97
>>>>>>>
>>>>>>> On Fri, Sep 2, 2022 at 10:19 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Let's come back to the original question: how to bring back the
>>>>>>>> second mgr?
>>>>>>>>
>>>>>>>> root@ceph1:~# ceph orch apply mgr 2
>>>>>>>> Scheduled mgr update...
>>>>>>>>
>>>>>>>> Nothing happened with above command, logs saying nothing
>>>>>>>>
>>>>>>>> 2022-09-02T14:16:20.407927+0000 mgr.ceph1.smfvfd (mgr.334626) 16939
>>>>>>>> : cephadm [INF] refreshing ceph2 facts
>>>>>>>> 2022-09-02T14:16:40.247195+0000 mgr.ceph1.smfvfd (mgr.334626) 16952
>>>>>>>> : cephadm [INF] Saving service mgr spec with placement count:2
>>>>>>>> 2022-09-02T14:16:53.106919+0000 mgr.ceph1.smfvfd (mgr.334626) 16961
>>>>>>>> : cephadm [INF] Saving service mgr spec with placement count:2
>>>>>>>> 2022-09-02T14:17:19.135203+0000 mgr.ceph1.smfvfd (mgr.334626) 16975
>>>>>>>> : cephadm [INF] refreshing ceph1 facts
>>>>>>>> 2022-09-02T14:17:20.780496+0000 mgr.ceph1.smfvfd (mgr.334626) 16977
>>>>>>>> : cephadm [INF] refreshing ceph2 facts
>>>>>>>> 2022-09-02T14:18:19.502034+0000 mgr.ceph1.smfvfd (mgr.334626) 17008
>>>>>>>> : cephadm [INF] refreshing ceph1 facts
>>>>>>>> 2022-09-02T14:18:21.127973+0000 mgr.ceph1.smfvfd (mgr.334626) 17010
>>>>>>>> : cephadm [INF] refreshing ceph2 facts
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 2, 2022 at 10:15 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Adam,
>>>>>>>>>
>>>>>>>>> Wait..wait.. now it's working suddenly without doing anything..
>>>>>>>>> very odd
>>>>>>>>>
>>>>>>>>> root@ceph1:~# ceph orch ls
>>>>>>>>> NAME                  RUNNING  REFRESHED  AGE  PLACEMENT    IMAGE
>>>>>>>>> NAME
>>>>>>>>>           IMAGE ID
>>>>>>>>> alertmanager              1/1  5s ago     2w   count:1
>>>>>>>>> quay.io/prometheus/alertmanager:v0.20.0
>>>>>>>>>                          0881eb8f169f
>>>>>>>>> crash                     2/2  5s ago     2w   *
>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>                          93146564743f
>>>>>>>>> grafana                   1/1  5s ago     2w   count:1
>>>>>>>>> quay.io/ceph/ceph-grafana:6.7.4
>>>>>>>>>                          557c83e11646
>>>>>>>>> mgr                       1/2  5s ago     8h   count:2
>>>>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>>>>>>>>>  93146564743f
>>>>>>>>> mon                       1/2  5s ago     8h   ceph1;ceph2
>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>                          93146564743f
>>>>>>>>> node-exporter             2/2  5s ago     2w   *
>>>>>>>>> quay.io/prometheus/node-exporter:v0.18.1
>>>>>>>>>                           e5a616e4b9cf
>>>>>>>>> osd.osd_spec_default      4/0  5s ago     -    <unmanaged>
>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>                          93146564743f
>>>>>>>>> prometheus                1/1  5s ago     2w   count:1
>>>>>>>>> quay.io/prometheus/prometheus:v2.18.1
>>>>>>>>>
>>>>>>>>> On Fri, Sep 2, 2022 at 10:13 AM Satish Patel <satish.txt@xxxxxxxxx>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I can see that in the output but I'm not sure how to get rid of
>>>>>>>>>> it.
>>>>>>>>>>
>>>>>>>>>> root@ceph1:~# ceph orch ps --refresh
>>>>>>>>>> NAME
>>>>>>>>>>          HOST   STATUS        REFRESHED  AGE  VERSION    IMAGE NAME
>>>>>>>>>>
>>>>>>>>>> IMAGE ID      CONTAINER ID
>>>>>>>>>> alertmanager.ceph1
>>>>>>>>>>          ceph1  running (9h)  64s ago    2w   0.20.0
>>>>>>>>>> quay.io/prometheus/alertmanager:v0.20.0
>>>>>>>>>>                            0881eb8f169f  ba804b555378
>>>>>>>>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>>>>>>>>>  ceph2  stopped       65s ago    -    <unknown>  <unknown>
>>>>>>>>>>                                                                  <unknown>
>>>>>>>>>>     <unknown>
>>>>>>>>>> crash.ceph1
>>>>>>>>>>         ceph1  running (9h)  64s ago    2w   15.2.17
>>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>>                            93146564743f  a3a431d834fc
>>>>>>>>>> crash.ceph2
>>>>>>>>>>         ceph2  running (9h)  65s ago    13d  15.2.17
>>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>>                            93146564743f  3c963693ff2b
>>>>>>>>>> grafana.ceph1
>>>>>>>>>>         ceph1  running (9h)  64s ago    2w   6.7.4
>>>>>>>>>> quay.io/ceph/ceph-grafana:6.7.4
>>>>>>>>>>                            557c83e11646  7583a8dc4c61
>>>>>>>>>> mgr.ceph1.smfvfd
>>>>>>>>>>          ceph1  running (8h)  64s ago    8h   15.2.17
>>>>>>>>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>>>>>>>>>>  93146564743f  1aab837306d2
>>>>>>>>>> mon.ceph1
>>>>>>>>>>         ceph1  running (9h)  64s ago    2w   15.2.17
>>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>>                            93146564743f  c1d155d8c7ad
>>>>>>>>>> node-exporter.ceph1
>>>>>>>>>>         ceph1  running (9h)  64s ago    2w   0.18.1
>>>>>>>>>> quay.io/prometheus/node-exporter:v0.18.1
>>>>>>>>>>                           e5a616e4b9cf  2ff235fe0e42
>>>>>>>>>> node-exporter.ceph2
>>>>>>>>>>         ceph2  running (9h)  65s ago    13d  0.18.1
>>>>>>>>>> quay.io/prometheus/node-exporter:v0.18.1
>>>>>>>>>>                           e5a616e4b9cf  17678b9ba602
>>>>>>>>>> osd.0
>>>>>>>>>>         ceph1  running (9h)  64s ago    13d  15.2.17
>>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>>                            93146564743f  d0fd73b777a3
>>>>>>>>>> osd.1
>>>>>>>>>>         ceph1  running (9h)  64s ago    13d  15.2.17
>>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>>                            93146564743f  049120e83102
>>>>>>>>>> osd.2
>>>>>>>>>>         ceph2  running (9h)  65s ago    13d  15.2.17
>>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>>                            93146564743f  8700e8cefd1f
>>>>>>>>>> osd.3
>>>>>>>>>>         ceph2  running (9h)  65s ago    13d  15.2.17
>>>>>>>>>> quay.io/ceph/ceph:v15
>>>>>>>>>>                            93146564743f  9c71bc87ed16
>>>>>>>>>> prometheus.ceph1
>>>>>>>>>>          ceph1  running (9h)  64s ago    2w   2.18.1
>>>>>>>>>> quay.io/prometheus/prometheus:v2.18.1
>>>>>>>>>>                            de242295e225  74a538efd61e
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 2, 2022 at 10:10 AM Adam King <adking@xxxxxxxxxx>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> maybe also a "ceph orch ps --refresh"? It might still have the
>>>>>>>>>>> old cached daemon inventory from before you remove the files.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 2, 2022 at 9:57 AM Satish Patel <
>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Adam,
>>>>>>>>>>>>
>>>>>>>>>>>> I have deleted file located here - rm
>>>>>>>>>>>> /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>>>>>>>>>>>
>>>>>>>>>>>> But still getting the same error, do i need to do anything
>>>>>>>>>>>> else?
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Sep 2, 2022 at 9:51 AM Adam King <adking@xxxxxxxxxx>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Okay, I'm wondering if this is an issue with version mismatch.
>>>>>>>>>>>>> Having previously had a 16.2.10 mgr and then now having a 15.2.17 one that
>>>>>>>>>>>>> doesn't expect this sort of thing to be present. Either way, I'd think just
>>>>>>>>>>>>> deleting this cephadm.7ce656a8721deb5054c37b0cfb9038
>>>>>>>>>>>>> 1522d521dde51fb0c5a2142314d663f63d (and any others like it) file
>>>>>>>>>>>>> would be the way forward to get orch ls working again.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 9:44 AM Satish Patel <
>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Adam,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In cephadm ls i found the following service but i believe it
>>>>>>>>>>>>>> was there before also.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>         "style": "cephadm:v1",
>>>>>>>>>>>>>>         "name":
>>>>>>>>>>>>>> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
>>>>>>>>>>>>>>         "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea",
>>>>>>>>>>>>>>         "systemd_unit":
>>>>>>>>>>>>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>>>>>>>>>>>>> ",
>>>>>>>>>>>>>>         "enabled": false,
>>>>>>>>>>>>>>         "state": "stopped",
>>>>>>>>>>>>>>         "container_id": null,
>>>>>>>>>>>>>>         "container_image_name": null,
>>>>>>>>>>>>>>         "container_image_id": null,
>>>>>>>>>>>>>>         "version": null,
>>>>>>>>>>>>>>         "started": null,
>>>>>>>>>>>>>>         "created": null,
>>>>>>>>>>>>>>         "deployed": null,
>>>>>>>>>>>>>>         "configured": null
>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Look like remove didn't work
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> root@ceph1:~# ceph orch rm cephadm
>>>>>>>>>>>>>> Failed to remove service. <cephadm> was not found.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> root@ceph1:~# ceph orch rm
>>>>>>>>>>>>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>>>>>>>>>>>>> Failed to remove service.
>>>>>>>>>>>>>> <cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d>
>>>>>>>>>>>>>> was not found.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 8:27 AM Adam King <adking@xxxxxxxxxx>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> this looks like an old traceback you would get if you ended
>>>>>>>>>>>>>>> up with a service type that shouldn't be there somehow. The things I'd
>>>>>>>>>>>>>>> probably check are that "cephadm ls" on either host definitely doesn't
>>>>>>>>>>>>>>> report and strange things that aren't actually daemons in your cluster such
>>>>>>>>>>>>>>> as "cephadm.<hash>". Another thing you could maybe try, as I believe the
>>>>>>>>>>>>>>> assertion it's giving is for an unknown service type here ("AssertionError:
>>>>>>>>>>>>>>> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to
>>>>>>>>>>>>>>> remove whatever it thinks is this "cephadm" service that it has deployed.
>>>>>>>>>>>>>>> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one
>>>>>>>>>>>>>>> instead of 15.2.17 (I'm assuming here, but the line numbers in that
>>>>>>>>>>>>>>> traceback suggest octopus). The 16.2.10 one is just much less likely to
>>>>>>>>>>>>>>> have a bug that causes something like this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel <
>>>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Now when I run "ceph orch ps" it works but the following
>>>>>>>>>>>>>>>> command throws an
>>>>>>>>>>>>>>>> error.  Trying to bring up second mgr using ceph orch apply
>>>>>>>>>>>>>>>> mgr command but
>>>>>>>>>>>>>>>> didn't help
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> root@ceph1:/ceph-disk# ceph version
>>>>>>>>>>>>>>>> ceph version 15.2.17
>>>>>>>>>>>>>>>> (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus
>>>>>>>>>>>>>>>> (stable)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> root@ceph1:/ceph-disk# ceph orch ls
>>>>>>>>>>>>>>>> Error EINVAL: Traceback (most recent call last):
>>>>>>>>>>>>>>>>   File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in
>>>>>>>>>>>>>>>> _handle_command
>>>>>>>>>>>>>>>>     return self.handle_command(inbuf, cmd)
>>>>>>>>>>>>>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py",
>>>>>>>>>>>>>>>> line 140, in
>>>>>>>>>>>>>>>> handle_command
>>>>>>>>>>>>>>>>     return dispatch[cmd['prefix']].call(self, cmd, inbuf)
>>>>>>>>>>>>>>>>   File "/usr/share/ceph/mgr/mgr_module.py", line 320, in
>>>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>>>     return self.func(mgr, **kwargs)
>>>>>>>>>>>>>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py",
>>>>>>>>>>>>>>>> line 102, in
>>>>>>>>>>>>>>>> <lambda>
>>>>>>>>>>>>>>>>     wrapper_copy = lambda *l_args, **l_kwargs:
>>>>>>>>>>>>>>>> wrapper(*l_args, **l_kwargs)
>>>>>>>>>>>>>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py",
>>>>>>>>>>>>>>>> line 91, in wrapper
>>>>>>>>>>>>>>>>     return func(*args, **kwargs)
>>>>>>>>>>>>>>>>   File "/usr/share/ceph/mgr/orchestrator/module.py", line
>>>>>>>>>>>>>>>> 503, in
>>>>>>>>>>>>>>>> _list_services
>>>>>>>>>>>>>>>>     raise_if_exception(completion)
>>>>>>>>>>>>>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py",
>>>>>>>>>>>>>>>> line 642, in
>>>>>>>>>>>>>>>> raise_if_exception
>>>>>>>>>>>>>>>>     raise e
>>>>>>>>>>>>>>>> AssertionError: cephadm
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel <
>>>>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > nevermind, i found doc related that and i am able to get
>>>>>>>>>>>>>>>> 1 mgr up -
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel <
>>>>>>>>>>>>>>>> satish.txt@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >> Folks,
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> I am having little fun time with cephadm and it's very
>>>>>>>>>>>>>>>> annoying to deal
>>>>>>>>>>>>>>>> >> with it
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> I have deployed a ceph cluster using cephadm on two
>>>>>>>>>>>>>>>> nodes. Now when i was
>>>>>>>>>>>>>>>> >> trying to upgrade and noticed hiccups where it just
>>>>>>>>>>>>>>>> upgraded a single mgr
>>>>>>>>>>>>>>>> >> with 16.2.10 but not other so i started messing around
>>>>>>>>>>>>>>>> and somehow I
>>>>>>>>>>>>>>>> >> deleted both mgr in the thought that cephadm will
>>>>>>>>>>>>>>>> recreate them.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> Now i don't have any single mgr so my ceph orch command
>>>>>>>>>>>>>>>> hangs forever and
>>>>>>>>>>>>>>>> >> looks like a chicken egg issue.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> How do I recover from this? If I can't run the ceph orch
>>>>>>>>>>>>>>>> command, I won't
>>>>>>>>>>>>>>>> >> be able to redeploy my mgr daemons.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> I am not able to find any mgr in the following command
>>>>>>>>>>>>>>>> on both nodes.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> $ cephadm ls | grep mgr
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>>>>>>>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx