Re: Issue upgrading 17.2.0 to 17.2.5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>
> Current cluster status says healthy but I cannot deploy new daemons, the
>> mgr information isnt refreshing (5 days old info) under hosts and services
>> but the main dashboard is accurate like ceph -s
>> Ceph -s will show accurate information but things like ceph orch ps
>> --daemon-type mgr will say that I have 5MGRs running which is inaccurate,
>> nor will it let me remove them manually as it says theyre not found
>>
>
Can you try a mgr failover (ceph mgr fail), wait ~5 minutes and then see
what actually gets refreshed (as in check the refreshed column in "ceph
orch ps" and "ceph orch device ls"). Typically when it's having issues like
this where it's "stuck" and not refreshing there is an issue blocking the
refresh on one specific host, so would be good to see if most hosts refresh
and there is only specific host(s) where the refresh doesn't occur.

osd.11                                                  basic
>  container_image                            stop
>
osd.47                                                  basic
>  container_image                            17.2.5
>                                                            *

osd.49                                                  basic
>  container_image                            17.2.5
>                                                            *


That looks bad.  Might be worth trying just a "ceph config set osd
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346"
to get all the osd config options onto a valid image. With those options it
will try to use the image "stop" or "17.2.5" when redeploying or upgrading
those OSDs.

On Tue, Mar 7, 2023 at 11:40 AM <aellahib@xxxxxxxxx> wrote:

> Hello at this point I've tried to upgrade a few times so I believe the
> command is long gone. On another forum someone was eluding that i
> accidentally set the image to "stop" instead of running a proper upgrade
> stop command but I couldnt find anything like that on the hosts I ran
> commands from but wouldnt be surprised if i accidentally pasted then wrote
> additional commands to it.
>
> The failing OSD was interesting, ceph didnt report it as a stray daemon
> but i noticed it was showing as a daemon but not as an actual OSD for
> storage in ceph, so I attempted to remove it and it would eventually come
> back.
>
> It had upgraded all the managers, mons to 17.2.5. Some OSDs had upgraded
> as well.
> Current cluster status says healthy but I cannot deploy new daemons, the
> mgr information isnt refreshing (5 days old info) under hosts and services
> but the main dashboard is accurate like ceph -s
> Ceph -s will show accurate information but things like ceph orch ps
> --daemon-type mgr will say that I have 5MGRs running which is inaccurate,
> nor will it let me remove them manually as it says theyre not found
>
> ERROR: Failed command: /usr/bin/docker pull 17.2.5
> 2023-03-06T09:26:55.925386-0700 mgr.mgr.idvkbw [DBG] serve loop sleep
> 2023-03-06T09:26:55.925507-0700 mgr.mgr.idvkbw [DBG] Sleeping for 60
> seconds
> 2023-03-06T09:27:55.925847-0700 mgr.mgr.idvkbw [DBG] serve loop wake
> 2023-03-06T09:27:55.925959-0700 mgr.mgr.idvkbw [DBG] serve loop start
> 2023-03-06T09:27:55.929849-0700 mgr.mgr.idvkbw [DBG] mon_command: 'config
> dump' -> 0 in 0.004s
> 2023-03-06T09:27:55.931625-0700 mgr.mgr.idvkbw [DBG] _run_cephadm :
> command = pull
> 2023-03-06T09:27:55.932025-0700 mgr.mgr.idvkbw [DBG] _run_cephadm : args =
> []
> 2023-03-06T09:27:55.932469-0700 mgr.mgr.idvkbw [DBG] args: --image 17.2.5
> --no-container-init pull
> 2023-03-06T09:27:55.932925-0700 mgr.mgr.idvkbw [DBG] Running command:
> which python3
> 2023-03-06T09:27:55.968793-0700 mgr.mgr.idvkbw [DBG] Running command:
> /usr/bin/python3
> /var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e
> --image 17.2.5 --no-container-init pull
> 2023-03-06T09:27:57.278932-0700 mgr.mgr.idvkbw [DBG] code: 1
> 2023-03-06T09:27:57.279045-0700 mgr.mgr.idvkbw [DBG] err: Pulling
> container image 17.2.5...
> Non-zero exit code 1 from /usr/bin/docker pull 17.2.5
> /usr/bin/docker: stdout Using default tag: latest
> /usr/bin/docker: stderr Error response from daemon: pull access denied for
> 17.2.5, repository does not exist or may require 'docker login': denied:
> requested access to the resource is denied
> ERROR: Failed command: /usr/bin/docker pull 17.2.5
>
> 2023-03-06T09:27:57.280517-0700 mgr.mgr.idvkbw [DBG] serve loop
>
> I had stopped the upgrade before so its at
> neteng@mon:~$ ceph orch upgrade status
> {
>     "target_image": null,
>     "in_progress": false,
>     "which": "<unknown>",
>     "services_complete": [],
>     "progress": null,
>     "message": "",
>     "is_paused": false
> }
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux