Upgrade failing to progress

Matthew Stroud <mattstroud@xxxxxxxxxxxxx> · Tue, 2 Aug 2022 15:59:25 +0000

Greetings! This is odd to me because most of the time the error messages I see point me to the fix/issue, but not in this case (or at least I'm not seeing it).

Via cephadm I was attempting to upgrade my cluster from 16.2.5 to 16.2.7 and it seems to be failing on attempting to get the uid and gid. Here is the the upgrade command I ran:
ceph orch upgrade start <internal repo>/ceph/ceph:v16.2.7
Here is what I'm getting in the logs:
2022-08-02T09:30:00.000174-0600 mon.openstack-mon01.b.pc.ostk.com [WRN] Health detail: HEALTH_WARN 12 failed cephadm daemon(s); Redeploying daemon prometheus.openstack-mon01 on host openstack-mon01 failed.
2022-08-02T09:30:00.000210-0600 mon.openstack-mon01 [WRN] [WRN] CEPHADM_FAILED_DAEMON: 12 failed cephadm daemon(s)
2022-08-02T09:30:00.000232-0600 mon.openstack-mon01 [WRN]     daemon node-exporter.openstack-mon01 on openstack-mon01m is in error state
2022-08-02T09:30:00.000247-0600 mon.openstack-mon01 [WRN]     daemon node-exporter.openstack-mon02 on openstack-mon02 is in error state
2022-08-02T09:30:00.000262-0600 mon.openstack-mon01 [WRN]     daemon node-exporter.openstack-mon03 on openstack-mon03 is in error state
2022-08-02T09:30:00.000272-0600 mon.openstack-mon01 [WRN]     daemon node-exporter.openstack-osd01 on openstack-osd01 is in error state
2022-08-02T09:30:00.000290-0600 mon.openstack-mon01 [WRN]     daemon node-exporter.openstack-osd02 on openstack-osd02 is in error state
2022-08-02T09:30:00.000327-0600 mon.openstack-mon01 [WRN]     daemon node-exporter.openstack-osd03 on openstack-osd03 is in error state
2022-08-02T09:30:00.000352-0600 mon.openstack-mon01 [WRN]     daemon node-exporter.openstack-osd04 on openstack-osd04 is in error state
2022-08-02T09:30:00.000359-0600 mon.openstack-mon01 [WRN]     daemon node-exporter.openstack-osd05 on openstack-osd05 is in error state
2022-08-02T09:30:00.000365-0600 mon.openstack-mon01 [WRN]     daemon node-exporter.openstack-osd06 on openstack-osd06 is in error state
2022-08-02T09:30:00.000371-0600 mon.openstack-mon01 [WRN]     daemon node-exporter.openstack-osd07 on openstack-osd07 is in error state
2022-08-02T09:30:00.000377-0600 mon.openstack-mon01 [WRN]     daemon node-exporter.openstack-osd08 on openstack-osd08 is in error state
2022-08-02T09:30:00.000383-0600 mon.openstack-mon01 [WRN]     daemon node-exporter.openstack-osd09 on openstack-osd09 is in error state
2022-08-02T09:30:00.000389-0600 mon.openstack-mon01 [WRN] [WRN] UPGRADE_REDEPLOY_DAEMON: Redeploying daemon prometheus.openstack-mon01 on host openstack-mon01.b.pc.ostk.com failed.
2022-08-02T09:30:00.000397-0600 mon.openstack-mon01 [WRN]     Upgrade daemon: prometheus.openstack-mon01: cephadm exited with an error code: 1, stderr:Redeploy daemon prometheus.openstack-mon01 ...
2022-08-02T09:30:00.000413-0600 mon.openstack-mon01 [WRN] Non-zero exit code 1 from /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=<internal repo>/prometheus/node-exporter:v0.18.1 -e NODE_NAME=openstack-mon01 -e CEPH_USE_RANDOM_NONCE=1 <internal repo>/prometheus/node-exporter:v0.18.1 -c %u %g /etc/prometheus
2022-08-02T09:30:00.000421-0600 mon.openstack-mon01 [WRN] stat: stderr stat: can't stat '/etc/prometheus': No such file or directory
2022-08-02T09:30:00.000430-0600 mon.openstack-mon01 [WRN] Traceback (most recent call last):
2022-08-02T09:30:00.000438-0600 mon.openstack-mon01 [WRN]   File "/var/lib/ceph/52bdc4d8-0dec-11ed-9416-18dbf2954237/cephadm.55e70975756e8c180366666f9fa21d3301c67edc3a5000698fd6e7ccb6fcafee", line 8571, in <module>
2022-08-02T09:30:00.000446-0600 mon.openstack-mon01 [WRN]     main()
2022-08-02T09:30:00.000462-0600 mon.openstack-mon01 [WRN]   File "/var/lib/ceph/52bdc4d8-0dec-11ed-9416-18dbf2954237/cephadm.55e70975756e8c180366666f9fa21d3301c67edc3a5000698fd6e7ccb6fcafee", line 8559, in main
2022-08-02T09:30:00.000470-0600 mon.openstack-mon01 [WRN]     r = ctx.func(ctx)
2022-08-02T09:30:00.000477-0600 mon.openstack-mon01 [WRN]   File "/var/lib/ceph/52bdc4d8-0dec-11ed-9416-18dbf2954237/cephadm.55e70975756e8c180366666f9fa21d3301c67edc3a5000698fd6e7ccb6fcafee", line 1787, in _default_image
2022-08-02T09:30:00.000491-0600 mon.openstack-mon01 [WRN]     return func(ctx)
2022-08-02T09:30:00.000497-0600 mon.openstack-mon01 [WRN]   File "/var/lib/ceph/52bdc4d8-0dec-11ed-9416-18dbf2954237/cephadm.55e70975756e8c180366666f9fa21d3301c67edc3a5000698fd6e7ccb6fcafee", line 4567, in command_deploy
2022-08-02T09:30:00.000506-0600 mon.openstack-mon01 [WRN]     uid, gid = extract_uid_gid_monitoring(ctx, daemon_type)
2022-08-02T09:30:00.000514-0600 mon.openstack-mon01 [WRN]   File "/var/lib/ceph/52bdc4d8-0dec-11ed-9416-18dbf2954237/cephadm.55e70975756e8c180366666f9fa21d3301c67edc3a5000698fd6e7ccb6fcafee", line 4493, in extract_uid_gid_monitoring
2022-08-02T09:30:00.000521-0600 mon.openstack-mon01 [WRN]     uid, gid = extract_uid_gid(ctx, file_path='/etc/prometheus')
2022-08-02T09:30:00.000530-0600 mon.openstack-mon01 [WRN]   File "/var/lib/ceph/52bdc4d8-0dec-11ed-9416-18dbf2954237/cephadm.55e70975756e8c180366666f9fa21d3301c67edc3a5000698fd6e7ccb6fcafee", line 2592, in extract_uid_gid
2022-08-02T09:30:00.000537-0600 mon.openstack-mon01 [WRN]     raise RuntimeError('uid/gid not found')
2022-08-02T09:30:00.000543-0600 mon.openstack-mon01 [WRN] RuntimeError: uid/gid not found
This is a fresh cluster, and it was created with this command:
cephadm --image <another repo>/ceph/ceph:v16 bootstrap --mon-ip <mon01 ip> --log-to-file --cluster-network <cluster network> --allow-fqdn-hostname --ssh-private-key /root/.ssh/id_rsa --ssh-public-key /root/.ssh/id_rsa.pub
With this being handled through cephadm (aka containers), I would expect that the folders needed would already be there.

Also, these hosts don't have access to the internet and will need to pull images from our internal repos.

Any pointers here?

Thanks,
Matthew Stroud

________________________________

CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx