Re: cephadm trouble

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

> Hmm, I'm not seeing anything that could be a cause in any of that output. I
> did notice, however, from your "ceph orch ls" output that none of your
> services have been refreshed since the 24th. Cephadm typically tries to
> refresh these things every 10 minutes so that signals something is quite
> wrong. 
>From what I see in /var/log/ceph/cephadm.log it tries to run the same command once a minute and does nothing else. That's why the status has not been updated for 5 days.

> Could you try running "ceph mgr fail" and if nothing seems to be
> resolved could you post "ceph log last 200 debug cephadm". Maybe we can see
> if something gets stuck again after the mgr restarts.
"ceph mgr fail" did not help.
"ceph log last 200 debug cephadm" show again and again and again:

2022-01-28T20:57:12.792090+0000 mgr.s-26-9-24-mon-m2.nhltmq (mgr.129738166) 349 : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling container image s-8-2-1:/dev/bcache0...
Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0
/usr/bin/podman: stderr Error: invalid reference format
ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in _remote_connection
    yield (conn, connr)
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm
    code, '\n'.join(err)))
orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Pulling container image s-8-2-1:/dev/bcache0...
Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0
/usr/bin/podman: stderr Error: invalid reference format
ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0
2022-01-28T20:58:13.092996+0000 mgr.s-26-9-24-mon-m2.nhltmq (mgr.129738166) 392 : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling container image s-8-2-1:/dev/bcache0...
Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0
/usr/bin/podman: stderr Error: invalid reference format
ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in _remote_connection
    yield (conn, connr)
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm
    code, '\n'.join(err)))
orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Pulling container image s-8-2-1:/dev/bcache0...
Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0
/usr/bin/podman: stderr Error: invalid reference format
ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0

> 
> Thanks,
> 
> - Adam King
> 
> On Thu, Jan 27, 2022 at 7:06 PM Fyodor Ustinov <ufm@xxxxxx> wrote:
> 
>> Hi!
>>
>> I think this happened after I tried to recreate the osd with the command
>> "ceph orch daemon add osd s-8-2-1:/dev/bcache0"
>>
>>
>> > It looks like cephadm believes "s-8-2-1:/dev/bcache0" is a container
>> image
>> > for some daemon. Can you provide the output of "ceph orch ls --format
>> > yaml",
>>
>> https://pastebin.com/CStBf4J0
>>
>> > "ceph orch upgrade status",
>> root@s-26-9-19-mon-m1:~# ceph orch upgrade status
>> {
>>     "target_image": null,
>>     "in_progress": false,
>>     "services_complete": [],
>>     "progress": null,
>>     "message": ""
>> }
>>
>>
>> > "ceph config get mgr container_image",
>> root@s-26-9-19-mon-m1:~# ceph config get mgr container_image
>>
>> quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728
>>
>>
>> > and the values for monitoring stack container images (format is "ceph
>> > config get mgr mgr/cephadm/container_image_<daemon-type>" where daemon
>> type
>> > is one of "prometheus", "node_exporter", "alertmanager", "grafana",
>> > "haproxy", "keepalived").
>> quay.io/prometheus/prometheus:v2.18.1
>> quay.io/prometheus/node-exporter:v0.18.1
>> quay.io/prometheus/alertmanager:v0.20.0
>> quay.io/ceph/ceph-grafana:6.7.4
>> docker.io/library/haproxy:2.3
>> docker.io/arcts/keepalived
>>
>> >
>> > Thanks,
>> >
>> > - Adam King
>>
>> Thanks a lot!
>>
>> WBR,
>>     Fyodor.
>>
>> >
>> > On Thu, Jan 27, 2022 at 9:10 AM Fyodor Ustinov <ufm@xxxxxx> wrote:
>> >
>> >> Hi!
>> >>
>> >> I rebooted the nodes with mgr and now I see the following in the
>> >> cephadm.log:
>> >>
>> >> As I understand it - cephadm is trying to execute some unsuccessful
>> >> command of mine (I wonder which one), it does not succeed, but it keeps
>> >> trying and trying. How do I stop it from trying?
>> >>
>> >> 2022-01-27 16:02:58,123 7fca7beca740 DEBUG
>> >>
>> --------------------------------------------------------------------------------
>> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull']
>> >> 2022-01-27 16:02:58,147 7fca7beca740 DEBUG /usr/bin/podman: 3.3.1
>> >> 2022-01-27 16:02:58,249 7fca7beca740 INFO Pulling container image
>> >> s-8-2-1:/dev/bcache0...
>> >> 2022-01-27 16:02:58,278 7fca7beca740 DEBUG /usr/bin/podman: Error:
>> invalid
>> >> reference format
>> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO Non-zero exit code 125 from
>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0
>> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO /usr/bin/podman: stderr Error:
>> >> invalid reference format
>> >> 2022-01-27 16:02:58,279 7fca7beca740 ERROR ERROR: Failed command:
>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0
>> >> 2022-01-27 16:03:58,420 7f897a7a6740 DEBUG
>> >>
>> --------------------------------------------------------------------------------
>> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull']
>> >> 2022-01-27 16:03:58,443 7f897a7a6740 DEBUG /usr/bin/podman: 3.3.1
>> >> 2022-01-27 16:03:58,547 7f897a7a6740 INFO Pulling container image
>> >> s-8-2-1:/dev/bcache0...
>> >> 2022-01-27 16:03:58,575 7f897a7a6740 DEBUG /usr/bin/podman: Error:
>> invalid
>> >> reference format
>> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO Non-zero exit code 125 from
>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0
>> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO /usr/bin/podman: stderr Error:
>> >> invalid reference format
>> >> 2022-01-27 16:03:58,577 7f897a7a6740 ERROR ERROR: Failed command:
>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0
>> >>
>> >> WBR,
>> >>     Fyodor.
>> >> _______________________________________________
>> >> ceph-users mailing list -- ceph-users@xxxxxxx
>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >>
>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux