To pull quay.io/prometheus/node-exporter:v1.5.0 the nodes would need access to the internet, yes. I don't fully understand the reason for > root@node-01:~# ceph config set mgr > mgr/cephadm/container_image_node_exporter > quay.io/prometheus/node-exporter:v1.5.0 though. Why not tell it to point to the image on the local repo similar to the one you passed to the upgrade command? Beyond that, it's hard to tell more without getting a snippet from the journal logs of a failed node-exporter daemon and potentially the output of `cep health detail`. I'll also point out that the node-exporter daemons won't automatically be redeployed when changing the config option for their container image. That still has to be triggered by an additional `ceph orch redeploy node-exporter` command once the config setting is correct (see https://docs.ceph.com/en/latest/cephadm/services/monitoring/#using-custom-images). You can check the /var/lib/ceph/<FSID>/<node-exporter-daemon-name>/unit.run file on a host with a node-exporter daemon to know for sure with which image the daemon attempted to start up. On Mon, Jul 15, 2024 at 1:10 PM Saif Mohammad <samdto987@xxxxxxxxx> wrote: > Hello, > > We are facing an issue with node-exporter entering an error state while > upgrading our cluster in an air-gapped environment. > Specifically, we are upgrading from quincyv17.2.0 to reefv18.2.2. To > facilitate this upgrade, we have set up a custom repository on a separate > machine within the same network and pushed required images to this private > repository. > > Here are the images that we have pushed: > > root@custom-registry:~# docker images > REPOSITORY TAG IMAGE ID > CREATED SIZE > 192.168.1.10:5000/ceph/ceph v18.2.2 > 3c937764e6f5 7 weeks ago 1.25GB > 192.168.1.10:5000/ceph/ceph-grafana 9.4.7 > 954c08fa6188 7 months ago 633MB > 192.168.1.10:5000/prometheus/prometheus v2.43.0 > a07b618ecd1d 16 months ago 234MB > 192.168.1.10:5000/prometheus/alertmanager v0.25.0 > c8568f914cd2 19 months ago 65.1MB > 192.168.1.10:5000/prometheus/node-exporter v1.5.0 > 0da6a335fe13 19 months ago 22.5MB > > Since we have configured insecure private registry, we added following > lines in "/etc/containers/registries.conf" file on each node of the cluster. > [[registry]] > location = "192.168.1.10:5000" > insecure = true > > Ceph image upgradation done by the command (ceph orch upgrade start > --image 192.168.1.10:5000/ceph/ceph:v18.2.2) but we encountered issues > from the node-exporter image, which fails to start and remains in an error > state. > > For the remaining images ( except ceph ) we used "ceph-config" command. We > are here now getting the new images values by executing the following > command that seems to be default ones. > root@node-01:~# ceph config get mgr > mgr/cephadm/container_image_node_exporter > quay.io/prometheus/node-exporter:v1.5.0 > > Does an internet connection is required to pull these images (Monitoring > stack components images)? I am unsure because whenever we attempt to run > the following command, node-exporter enters an error state and not coming > into the running state even after redeploy. > root@node-01:~# ceph config set mgr > mgr/cephadm/container_image_node_exporter > quay.io/prometheus/node-exporter:v1.5.0 > > Please advise the best approach to upgrade the cluster including all > images in Air-gapped environment? > > Any guidance on resolving this issue would be appreciated. > > Regards, > Mohammad Saif > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx