[16.2.6] When adding new host, cephadm deploys ceph image that no longer exists

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all,

I'm trying to troubleshoot a test cluster that is attempting to deploy an old
quay.io/ceph/ceph@sha256:<hash> image that no longer exists when adding a new
host.

The cluster is running 16.2.6 and was deployed last week with:

    cephadm bootstrap --mon-ip $(facter -p ipaddress) --allow-fqdn-hostname --ssh-user cephadm
    # Within "cephadm shell"
    ceph orch host add <hostname> <IP> _admin
    <repeated for 14 more hosts>

This initial cluster worked fine and the mon/mgr/osd/crash/etc containers were
all running the following image:

    quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c

This week, we tried deploying 3 additional hosts using the same "ceph orch host
add" commands and cephadm seems to be attempting to deploy the same image, but
it no longer exists on quay.io.

The error shows up in the active mgr's logs as:

    Non-zero exit code 125 from /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c -e NODE_NAME=<hostname> -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c -c %u %g /var/lib/ceph
    stat: stderr Trying to pull quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c...
    stat: stderr Error: Error initializing source docker://quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c: Error reading manifest sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c in quay.io/ceph/ceph: manifest unknown: manifest unknown

I suspect this is because of the container_image global config option:

    [ceph: root@<hostname> /]# ceph config-key get config/global/container_image
    quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c

My questions are:

* Is it expected for the cluster to reference a (potentially nonexistent) image
  by sha256 hash versus (eg.) the :v16 or :v16.2.6 tags?

* What's the best way to get back into a state where new hosts can be added
  again? Is it sufficient to just update the container_image global config?

Thank you!
Andrew Gunnerson
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux