Thank you very much! The previous attempts at adding new hosts with the missing image seems to have left cephadm in a bad state. We restarted the mgrs and then did an upgrade to the same version using: ceph orch upgrade start --ceph-version 16.2.6 and that seems to have deployed new images with the latest digest. We were able to successfully add hosts after that. On Wed, Sep 29, 2021, at 13:06, David Orman wrote: > It appears when an updated container for 16.2.6 (there was a remoto > version included with a bug in the first release) was pushed, the old > one was removed from quay. We had to update our 16.2.6 clusters to the > 'new' 16.2.6 version, and just did the typical upgrade with the image > specified. This should resolve your issue, as well as fixing the > effects of the remoto bug: > > https://tracker.ceph.com/issues/50526 > https://github.com/alfredodeza/remoto/pull/63 > > Once you're upgraded, I would expect it to use the correct hash for > the host adds. > > On Wed, Sep 29, 2021 at 11:02 AM Andrew Gunnerson > <accounts.ceph@xxxxxxxxxxxx> wrote: >> >> Hello all, >> >> I'm trying to troubleshoot a test cluster that is attempting to deploy an old >> quay.io/ceph/ceph@sha256:<hash> image that no longer exists when adding a new >> host. >> >> The cluster is running 16.2.6 and was deployed last week with: >> >> cephadm bootstrap --mon-ip $(facter -p ipaddress) --allow-fqdn-hostname --ssh-user cephadm >> # Within "cephadm shell" >> ceph orch host add <hostname> <IP> _admin >> <repeated for 14 more hosts> >> >> This initial cluster worked fine and the mon/mgr/osd/crash/etc containers were >> all running the following image: >> >> quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c >> >> This week, we tried deploying 3 additional hosts using the same "ceph orch host >> add" commands and cephadm seems to be attempting to deploy the same image, but >> it no longer exists on quay.io. >> >> The error shows up in the active mgr's logs as: >> >> Non-zero exit code 125 from /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c -e NODE_NAME=<hostname> -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c -c %u %g /var/lib/ceph >> stat: stderr Trying to pull quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c... >> stat: stderr Error: Error initializing source docker://quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c: Error reading manifest sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c in quay.io/ceph/ceph: manifest unknown: manifest unknown >> >> I suspect this is because of the container_image global config option: >> >> [ceph: root@<hostname> /]# ceph config-key get config/global/container_image >> quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c >> >> My questions are: >> >> * Is it expected for the cluster to reference a (potentially nonexistent) image >> by sha256 hash versus (eg.) the :v16 or :v16.2.6 tags? >> >> * What's the best way to get back into a state where new hosts can be added >> again? Is it sufficient to just update the container_image global config? >> >> Thank you! >> Andrew Gunnerson >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx