Re: Node Exporter keep failing while upgrading cluster in Air-gapped ( isolated environment ).

Adam King <adking@xxxxxxxxxx> · Mon, 15 Jul 2024 15:42:35 -0400

To pull  quay.io/prometheus/node-exporter:v1.5.0 the nodes would need
access to the internet, yes. I don't fully understand the reason for

> root@node-01:~# ceph config set mgr
> mgr/cephadm/container_image_node_exporter
> quay.io/prometheus/node-exporter:v1.5.0

though. Why not tell it to point to the image on the local repo similar to
the one you passed to the upgrade command? Beyond that, it's hard to tell
more without getting a snippet from the journal logs of a failed
node-exporter daemon and potentially the output of `cep health detail`.
I'll also point out that the node-exporter daemons won't automatically be
redeployed when changing the config option for their container image. That
still has to be triggered by an additional `ceph orch redeploy
node-exporter` command once the config setting is correct (see
https://docs.ceph.com/en/latest/cephadm/services/monitoring/#using-custom-images).
You can check the /var/lib/ceph/<FSID>/<node-exporter-daemon-name>/unit.run
file on a host with a node-exporter daemon to know for sure with which
image the daemon attempted to start up.

On Mon, Jul 15, 2024 at 1:10 PM Saif Mohammad <samdto987@xxxxxxxxx> wrote:

> Hello,
>
> We are facing an issue with node-exporter entering an error state while
> upgrading our cluster in an air-gapped environment.
> Specifically, we are upgrading from quincyv17.2.0 to reefv18.2.2. To
> facilitate this upgrade, we have set up a custom repository on a separate
> machine within the same network and pushed required images to this private
> repository.
>
> Here are the images that we have pushed:
>
> root@custom-registry:~# docker images
> REPOSITORY                                             TAG       IMAGE ID
>      CREATED         SIZE
> 192.168.1.10:5000/ceph/ceph                          v18.2.2
>  3c937764e6f5   7 weeks ago     1.25GB
> 192.168.1.10:5000/ceph/ceph-grafana                  9.4.7
>  954c08fa6188   7 months ago    633MB
> 192.168.1.10:5000/prometheus/prometheus              v2.43.0
>  a07b618ecd1d   16 months ago   234MB
> 192.168.1.10:5000/prometheus/alertmanager            v0.25.0
>  c8568f914cd2   19 months ago   65.1MB
> 192.168.1.10:5000/prometheus/node-exporter           v1.5.0
> 0da6a335fe13   19 months ago   22.5MB
>
> Since we have configured insecure private registry, we added following
> lines in "/etc/containers/registries.conf" file on each node of the cluster.
> [[registry]]
> location = "192.168.1.10:5000"
> insecure = true
>
> Ceph image upgradation done by the command (ceph orch upgrade start
> --image 192.168.1.10:5000/ceph/ceph:v18.2.2) but we encountered issues
> from the node-exporter image, which fails to start and remains in an error
> state.
>
> For the remaining images ( except ceph ) we used "ceph-config" command. We
> are here now getting the new images values by executing the following
> command that seems to be default ones.
> root@node-01:~# ceph config get mgr
> mgr/cephadm/container_image_node_exporter
> quay.io/prometheus/node-exporter:v1.5.0
>
> Does an internet connection is required to pull these images (Monitoring
> stack components images)? I am unsure because whenever we attempt to run
> the following command, node-exporter enters an error state and not coming
> into the running state even after redeploy.
> root@node-01:~# ceph config set mgr
> mgr/cephadm/container_image_node_exporter
> quay.io/prometheus/node-exporter:v1.5.0
>
> Please advise the best approach to upgrade the cluster including all
> images in Air-gapped environment?
>
> Any guidance on resolving this issue would be appreciated.
>
> Regards,
> Mohammad Saif
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx