Hello. We're rebuilding our OSD nodes. Once cluster worked without any issues, this one is being stubborn I attempted to add one back to the cluster and seeing the error below in out logs: cephadm ['--image', 'registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160', 'pull'] 2024-03-27 19:30:53,901 7f49792ed740 DEBUG /bin/podman: 4.6.1 2024-03-27 19:30:53,905 7f49792ed740 INFO Pulling container image registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160... 2024-03-27 19:30:54,045 7f49792ed740 DEBUG /bin/podman: Trying to pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160... 2024-03-27 19:30:54,266 7f49792ed740 DEBUG /bin/podman: Error: initializing source docker://registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160: reading manifest 16.2.10-160 in registry.redhat.io/rhceph/rhceph-5-rhel8: manifest unknown 2024-03-27 19:30:54,270 7f49792ed740 INFO Non-zero exit code 125 from /bin/podman pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160 2024-03-27 19:30:54,270 7f49792ed740 INFO /bin/podman: stderr Trying to pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160... 2024-03-27 19:30:54,270 7f49792ed740 INFO /bin/podman: stderr Error: initializing source docker://registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160: reading manifest 16.2.10-160 in registry.redhat.io/rhceph/rhceph-5-rhel8: manifest unknown 2024-03-27 19:30:54,270 7f49792ed740 ERROR ERROR: Failed command: /bin/podman pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160 $ ceph versions { "mon": { "ceph version 16.2.10-208.el8cp (791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 1, "ceph version 16.2.10-248.el8cp (0edb63afd9bd3edb333364f2e0031b77e62f4896) pacific (stable)": 2 }, "mgr": { "ceph version 16.2.10-208.el8cp (791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 1, "ceph version 16.2.10-248.el8cp (0edb63afd9bd3edb333364f2e0031b77e62f4896) pacific (stable)": 2 }, "osd": { "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160 }, "mds": {}, "rgw": { "ceph version 16.2.10-208.el8cp (791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 3 }, "overall": { "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160, "ceph version 16.2.10-208.el8cp (791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 5, "ceph version 16.2.10-248.el8cp (0edb63afd9bd3edb333364f2e0031b77e62f4896) pacific (stable)": 4 } } I don't understand why it's trying to pull 16.2.10-160 which doesn't exist. registry.redhat.io/rhceph/rhceph-5-dashboard-rhel8 5 93b3137e7a65 11 months ago 696 MB registry.redhat.io/rhceph/rhceph-5-rhel8 5-416 838cea16e15c 11 months ago 1.02 GB registry.redhat.io/openshift4/ose-prometheus v4.6 ec2d358ca73c 17 months ago 397 MB This happens using cepadm-ansible as well as $ ceph orch ls --export --service_name xxx > xxx.yml $ sudo ceph orch apply -i xxx.yml I tried ceph orch daemon add osd host:/dev/sda which surprisingly created a volume on host:/dev/sda and created an osd i can see in $ ceph osd tree but It did not get added to host I suspect because of the same Podman error and now I'm unable remove it. $ ceph orch osd rm does not work even with the --force flag. I stopped the removal with $ ceph orch osd rm stop after 10+ minutes I'm considering running $ ceph osd purge osd# --force but worried it may only make things worse. ceph -s shows that osd but not up or in. Thanks, and looking forward to any advice! _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx