Hi! No mode ideas? :( ----- Original Message ----- > From: "Fyodor Ustinov" <ufm@xxxxxx> > To: "Adam King" <adking@xxxxxxxxxx> > Cc: "ceph-users" <ceph-users@xxxxxxx> > Sent: Friday, 28 January, 2022 23:02:26 > Subject: Re: cephadm trouble > Hi! > >> Hmm, I'm not seeing anything that could be a cause in any of that output. I >> did notice, however, from your "ceph orch ls" output that none of your >> services have been refreshed since the 24th. Cephadm typically tries to >> refresh these things every 10 minutes so that signals something is quite >> wrong. > From what I see in /var/log/ceph/cephadm.log it tries to run the same command > once a minute and does nothing else. That's why the status has not been updated > for 5 days. > >> Could you try running "ceph mgr fail" and if nothing seems to be >> resolved could you post "ceph log last 200 debug cephadm". Maybe we can see >> if something gets stuck again after the mgr restarts. > "ceph mgr fail" did not help. > "ceph log last 200 debug cephadm" show again and again and again: > > 2022-01-28T20:57:12.792090+0000 mgr.s-26-9-24-mon-m2.nhltmq (mgr.129738166) 349 > : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling container > image s-8-2-1:/dev/bcache0... > Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 > /usr/bin/podman: stderr Error: invalid reference format > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 > Traceback (most recent call last): > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in _remote_connection > yield (conn, connr) > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm > code, '\n'.join(err))) > orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, > stderr:Pulling container image s-8-2-1:/dev/bcache0... > Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 > /usr/bin/podman: stderr Error: invalid reference format > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 > 2022-01-28T20:58:13.092996+0000 mgr.s-26-9-24-mon-m2.nhltmq (mgr.129738166) 392 > : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling container > image s-8-2-1:/dev/bcache0... > Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 > /usr/bin/podman: stderr Error: invalid reference format > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 > Traceback (most recent call last): > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in _remote_connection > yield (conn, connr) > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm > code, '\n'.join(err))) > orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, > stderr:Pulling container image s-8-2-1:/dev/bcache0... > Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 > /usr/bin/podman: stderr Error: invalid reference format > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 > >> >> Thanks, >> >> - Adam King >> >> On Thu, Jan 27, 2022 at 7:06 PM Fyodor Ustinov <ufm@xxxxxx> wrote: >> >>> Hi! >>> >>> I think this happened after I tried to recreate the osd with the command >>> "ceph orch daemon add osd s-8-2-1:/dev/bcache0" >>> >>> >>> > It looks like cephadm believes "s-8-2-1:/dev/bcache0" is a container >>> image >>> > for some daemon. Can you provide the output of "ceph orch ls --format >>> > yaml", >>> >>> https://pastebin.com/CStBf4J0 >>> >>> > "ceph orch upgrade status", >>> root@s-26-9-19-mon-m1:~# ceph orch upgrade status >>> { >>> "target_image": null, >>> "in_progress": false, >>> "services_complete": [], >>> "progress": null, >>> "message": "" >>> } >>> >>> >>> > "ceph config get mgr container_image", >>> root@s-26-9-19-mon-m1:~# ceph config get mgr container_image >>> >>> quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728 >>> >>> >>> > and the values for monitoring stack container images (format is "ceph >>> > config get mgr mgr/cephadm/container_image_<daemon-type>" where daemon >>> type >>> > is one of "prometheus", "node_exporter", "alertmanager", "grafana", >>> > "haproxy", "keepalived"). >>> quay.io/prometheus/prometheus:v2.18.1 >>> quay.io/prometheus/node-exporter:v0.18.1 >>> quay.io/prometheus/alertmanager:v0.20.0 >>> quay.io/ceph/ceph-grafana:6.7.4 >>> docker.io/library/haproxy:2.3 >>> docker.io/arcts/keepalived >>> >>> > >>> > Thanks, >>> > >>> > - Adam King >>> >>> Thanks a lot! >>> >>> WBR, >>> Fyodor. >>> >>> > >>> > On Thu, Jan 27, 2022 at 9:10 AM Fyodor Ustinov <ufm@xxxxxx> wrote: >>> > >>> >> Hi! >>> >> >>> >> I rebooted the nodes with mgr and now I see the following in the >>> >> cephadm.log: >>> >> >>> >> As I understand it - cephadm is trying to execute some unsuccessful >>> >> command of mine (I wonder which one), it does not succeed, but it keeps >>> >> trying and trying. How do I stop it from trying? >>> >> >>> >> 2022-01-27 16:02:58,123 7fca7beca740 DEBUG >>> >> >>> -------------------------------------------------------------------------------- >>> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull'] >>> >> 2022-01-27 16:02:58,147 7fca7beca740 DEBUG /usr/bin/podman: 3.3.1 >>> >> 2022-01-27 16:02:58,249 7fca7beca740 INFO Pulling container image >>> >> s-8-2-1:/dev/bcache0... >>> >> 2022-01-27 16:02:58,278 7fca7beca740 DEBUG /usr/bin/podman: Error: >>> invalid >>> >> reference format >>> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO Non-zero exit code 125 from >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO /usr/bin/podman: stderr Error: >>> >> invalid reference format >>> >> 2022-01-27 16:02:58,279 7fca7beca740 ERROR ERROR: Failed command: >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> >> 2022-01-27 16:03:58,420 7f897a7a6740 DEBUG >>> >> >>> -------------------------------------------------------------------------------- >>> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull'] >>> >> 2022-01-27 16:03:58,443 7f897a7a6740 DEBUG /usr/bin/podman: 3.3.1 >>> >> 2022-01-27 16:03:58,547 7f897a7a6740 INFO Pulling container image >>> >> s-8-2-1:/dev/bcache0... >>> >> 2022-01-27 16:03:58,575 7f897a7a6740 DEBUG /usr/bin/podman: Error: >>> invalid >>> >> reference format >>> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO Non-zero exit code 125 from >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO /usr/bin/podman: stderr Error: >>> >> invalid reference format >>> >> 2022-01-27 16:03:58,577 7f897a7a6740 ERROR ERROR: Failed command: >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> >> >>> >> WBR, >>> >> Fyodor. >>> >> _______________________________________________ >>> >> ceph-users mailing list -- ceph-users@xxxxxxx >>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>> >> >>> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx