Re: ceph orchestator pulls strange images from

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On 15-09-2023 09:21, Boris Behrens wrote:
Hi Stefan,

the cluster is running 17.6.2 through the board. The mentioned container with other version don't show in the ceph -s or ceph verions.
It looks like it is host related.
One host get the correct 17.2.6 images, one get the 16.2.11 images and the third one uses the 7.0.0-7183-g54142666 (whatever this is) images.

root@0cc47a6df330:~# ceph config-key get config/global/container_image

root@0cc47a6df330:~# ceph config-key list |grep container_image

I've tried to set the detault image to ceph config-key set config/global/container_image <>
But I can not redeploy the mgr daemons, because there is no standby daemon.

root@0cc47a6df330:~# ceph orch redeploy mgr
Error EINVAL: Unable to schedule redeploy for mgr.0cc47aad8ce8: No standby MGR

But there should be:
root@0cc47a6df330:~# ceph orch ps
NAME                     HOST                             PORTS   STATUS         REFRESHED  AGE  MEM USE  MEM LIM  VERSION    IMAGE ID  CONTAINER ID mgr.0cc47a6df14e.iltiot  0cc47a6df14e  *:9283  running (23s)    22s ago   2m    10.6M        -  16.2.11    de4b0b384ad4  0f31a162fa3e mgr.0cc47aad8ce8         0cc47aad8ce8          running (16h)     8m ago  16h     591M        -  17.2.6     22cd8daf4d70  8145c63fdc44

I guess that one of the managers is not working correctly (probably the 16.2.11 version). IIRC I have changed the image reference for a container (systemd unit files) once, when I managed to redeploy all containers with a non-working image (test setup). so first make sure what manager is actually running, then try to fix the other one by editing the relevant config for that container (point it to the same image as the running container). Pull necessary image first if need be. After you've got a standby manager up and running, you can redeploy the necessary daemons. Be careful ... there are commands that redeploy all daemons at the same time, you don't want to do that normally ;-).

root@0cc47a6df330:~# ceph orch ls
mgr              2/2  8m ago     19h  0cc47a6df14e;0cc47a6df330;0cc47aad8ce8

I've also remove podman and containerd, kill all directories and then do a fresh reinstall of podman, which also did not work. It's also strange that the daemons with the wonky version got an extra suffix.

If I would now how, I would happily nuke the whole orchestrator, podman and everything that goes along with it, and start over. In the end it is not that hard to start some mgr/mon daemons without podman, so I would be back to a classical cluster. I tried this yesterday, but the daemons still use that very strange images and I just don't understand why.

I could just nuke the whole dev cluster, wipe all disks and start fresh after reinstalling the hosts, but as I have to adopt 17 clusters to the orchestrator, I rather get some learnings from the not working thing :)

There is actually a cephadm "kill it with fire" option to do that for you, but yeah, make sure you know how to fix it when things do not go according to plan. It all magically works, until it doesn't ;-).

Good luck, and keep us updated with any further challenges / progress.

Gr. Stefan
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]

  Powered by Linux