Hi! > Hmm, I'm not seeing anything that could be a cause in any of that output. I > did notice, however, from your "ceph orch ls" output that none of your > services have been refreshed since the 24th. Cephadm typically tries to > refresh these things every 10 minutes so that signals something is quite > wrong. >From what I see in /var/log/ceph/cephadm.log it tries to run the same command once a minute and does nothing else. That's why the status has not been updated for 5 days. > Could you try running "ceph mgr fail" and if nothing seems to be > resolved could you post "ceph log last 200 debug cephadm". Maybe we can see > if something gets stuck again after the mgr restarts. "ceph mgr fail" did not help. "ceph log last 200 debug cephadm" show again and again and again: 2022-01-28T20:57:12.792090+0000 mgr.s-26-9-24-mon-m2.nhltmq (mgr.129738166) 349 : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling container image s-8-2-1:/dev/bcache0... Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 /usr/bin/podman: stderr Error: invalid reference format ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in _remote_connection yield (conn, connr) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm code, '\n'.join(err))) orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Pulling container image s-8-2-1:/dev/bcache0... Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 /usr/bin/podman: stderr Error: invalid reference format ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 2022-01-28T20:58:13.092996+0000 mgr.s-26-9-24-mon-m2.nhltmq (mgr.129738166) 392 : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling container image s-8-2-1:/dev/bcache0... Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 /usr/bin/podman: stderr Error: invalid reference format ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in _remote_connection yield (conn, connr) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm code, '\n'.join(err))) orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Pulling container image s-8-2-1:/dev/bcache0... Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 /usr/bin/podman: stderr Error: invalid reference format ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 > > Thanks, > > - Adam King > > On Thu, Jan 27, 2022 at 7:06 PM Fyodor Ustinov <ufm@xxxxxx> wrote: > >> Hi! >> >> I think this happened after I tried to recreate the osd with the command >> "ceph orch daemon add osd s-8-2-1:/dev/bcache0" >> >> >> > It looks like cephadm believes "s-8-2-1:/dev/bcache0" is a container >> image >> > for some daemon. Can you provide the output of "ceph orch ls --format >> > yaml", >> >> https://pastebin.com/CStBf4J0 >> >> > "ceph orch upgrade status", >> root@s-26-9-19-mon-m1:~# ceph orch upgrade status >> { >> "target_image": null, >> "in_progress": false, >> "services_complete": [], >> "progress": null, >> "message": "" >> } >> >> >> > "ceph config get mgr container_image", >> root@s-26-9-19-mon-m1:~# ceph config get mgr container_image >> >> quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728 >> >> >> > and the values for monitoring stack container images (format is "ceph >> > config get mgr mgr/cephadm/container_image_<daemon-type>" where daemon >> type >> > is one of "prometheus", "node_exporter", "alertmanager", "grafana", >> > "haproxy", "keepalived"). >> quay.io/prometheus/prometheus:v2.18.1 >> quay.io/prometheus/node-exporter:v0.18.1 >> quay.io/prometheus/alertmanager:v0.20.0 >> quay.io/ceph/ceph-grafana:6.7.4 >> docker.io/library/haproxy:2.3 >> docker.io/arcts/keepalived >> >> > >> > Thanks, >> > >> > - Adam King >> >> Thanks a lot! >> >> WBR, >> Fyodor. >> >> > >> > On Thu, Jan 27, 2022 at 9:10 AM Fyodor Ustinov <ufm@xxxxxx> wrote: >> > >> >> Hi! >> >> >> >> I rebooted the nodes with mgr and now I see the following in the >> >> cephadm.log: >> >> >> >> As I understand it - cephadm is trying to execute some unsuccessful >> >> command of mine (I wonder which one), it does not succeed, but it keeps >> >> trying and trying. How do I stop it from trying? >> >> >> >> 2022-01-27 16:02:58,123 7fca7beca740 DEBUG >> >> >> -------------------------------------------------------------------------------- >> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull'] >> >> 2022-01-27 16:02:58,147 7fca7beca740 DEBUG /usr/bin/podman: 3.3.1 >> >> 2022-01-27 16:02:58,249 7fca7beca740 INFO Pulling container image >> >> s-8-2-1:/dev/bcache0... >> >> 2022-01-27 16:02:58,278 7fca7beca740 DEBUG /usr/bin/podman: Error: >> invalid >> >> reference format >> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO Non-zero exit code 125 from >> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO /usr/bin/podman: stderr Error: >> >> invalid reference format >> >> 2022-01-27 16:02:58,279 7fca7beca740 ERROR ERROR: Failed command: >> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >> >> 2022-01-27 16:03:58,420 7f897a7a6740 DEBUG >> >> >> -------------------------------------------------------------------------------- >> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull'] >> >> 2022-01-27 16:03:58,443 7f897a7a6740 DEBUG /usr/bin/podman: 3.3.1 >> >> 2022-01-27 16:03:58,547 7f897a7a6740 INFO Pulling container image >> >> s-8-2-1:/dev/bcache0... >> >> 2022-01-27 16:03:58,575 7f897a7a6740 DEBUG /usr/bin/podman: Error: >> invalid >> >> reference format >> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO Non-zero exit code 125 from >> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO /usr/bin/podman: stderr Error: >> >> invalid reference format >> >> 2022-01-27 16:03:58,577 7f897a7a6740 ERROR ERROR: Failed command: >> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >> >> >> >> WBR, >> >> Fyodor. >> >> _______________________________________________ >> >> ceph-users mailing list -- ceph-users@xxxxxxx >> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> >> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx