It appears when an updated container for 16.2.6 (there was a remoto version included with a bug in the first release) was pushed, the old one was removed from quay. We had to update our 16.2.6 clusters to the 'new' 16.2.6 version, and just did the typical upgrade with the image specified. This should resolve your issue, as well as fixing the effects of the remoto bug: https://tracker.ceph.com/issues/50526 https://github.com/alfredodeza/remoto/pull/63 Once you're upgraded, I would expect it to use the correct hash for the host adds. On Wed, Sep 29, 2021 at 11:02 AM Andrew Gunnerson <accounts.ceph@xxxxxxxxxxxx> wrote: > > Hello all, > > I'm trying to troubleshoot a test cluster that is attempting to deploy an old > quay.io/ceph/ceph@sha256:<hash> image that no longer exists when adding a new > host. > > The cluster is running 16.2.6 and was deployed last week with: > > cephadm bootstrap --mon-ip $(facter -p ipaddress) --allow-fqdn-hostname --ssh-user cephadm > # Within "cephadm shell" > ceph orch host add <hostname> <IP> _admin > <repeated for 14 more hosts> > > This initial cluster worked fine and the mon/mgr/osd/crash/etc containers were > all running the following image: > > quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c > > This week, we tried deploying 3 additional hosts using the same "ceph orch host > add" commands and cephadm seems to be attempting to deploy the same image, but > it no longer exists on quay.io. > > The error shows up in the active mgr's logs as: > > Non-zero exit code 125 from /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c -e NODE_NAME=<hostname> -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c -c %u %g /var/lib/ceph > stat: stderr Trying to pull quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c... > stat: stderr Error: Error initializing source docker://quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c: Error reading manifest sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c in quay.io/ceph/ceph: manifest unknown: manifest unknown > > I suspect this is because of the container_image global config option: > > [ceph: root@<hostname> /]# ceph config-key get config/global/container_image > quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c > > My questions are: > > * Is it expected for the cluster to reference a (potentially nonexistent) image > by sha256 hash versus (eg.) the :v16 or :v16.2.6 tags? > > * What's the best way to get back into a state where new hosts can be added > again? Is it sufficient to just update the container_image global config? > > Thank you! > Andrew Gunnerson > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx