Re: cephadm trouble

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

No mode ideas? :(


----- Original Message -----
> From: "Fyodor Ustinov" <ufm@xxxxxx>
> To: "Adam King" <adking@xxxxxxxxxx>
> Cc: "ceph-users" <ceph-users@xxxxxxx>
> Sent: Friday, 28 January, 2022 23:02:26
> Subject:  Re: cephadm trouble

> Hi!
> 
>> Hmm, I'm not seeing anything that could be a cause in any of that output. I
>> did notice, however, from your "ceph orch ls" output that none of your
>> services have been refreshed since the 24th. Cephadm typically tries to
>> refresh these things every 10 minutes so that signals something is quite
>> wrong.
> From what I see in /var/log/ceph/cephadm.log it tries to run the same command
> once a minute and does nothing else. That's why the status has not been updated
> for 5 days.
> 
>> Could you try running "ceph mgr fail" and if nothing seems to be
>> resolved could you post "ceph log last 200 debug cephadm". Maybe we can see
>> if something gets stuck again after the mgr restarts.
> "ceph mgr fail" did not help.
> "ceph log last 200 debug cephadm" show again and again and again:
> 
> 2022-01-28T20:57:12.792090+0000 mgr.s-26-9-24-mon-m2.nhltmq (mgr.129738166) 349
> : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling container
> image s-8-2-1:/dev/bcache0...
> Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0
> /usr/bin/podman: stderr Error: invalid reference format
> ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0
> Traceback (most recent call last):
>  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in _remote_connection
>    yield (conn, connr)
>  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm
>    code, '\n'.join(err)))
> orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1,
> stderr:Pulling container image s-8-2-1:/dev/bcache0...
> Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0
> /usr/bin/podman: stderr Error: invalid reference format
> ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0
> 2022-01-28T20:58:13.092996+0000 mgr.s-26-9-24-mon-m2.nhltmq (mgr.129738166) 392
> : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling container
> image s-8-2-1:/dev/bcache0...
> Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0
> /usr/bin/podman: stderr Error: invalid reference format
> ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0
> Traceback (most recent call last):
>  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in _remote_connection
>    yield (conn, connr)
>  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm
>    code, '\n'.join(err)))
> orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1,
> stderr:Pulling container image s-8-2-1:/dev/bcache0...
> Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0
> /usr/bin/podman: stderr Error: invalid reference format
> ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0
> 
>> 
>> Thanks,
>> 
>> - Adam King
>> 
>> On Thu, Jan 27, 2022 at 7:06 PM Fyodor Ustinov <ufm@xxxxxx> wrote:
>> 
>>> Hi!
>>>
>>> I think this happened after I tried to recreate the osd with the command
>>> "ceph orch daemon add osd s-8-2-1:/dev/bcache0"
>>>
>>>
>>> > It looks like cephadm believes "s-8-2-1:/dev/bcache0" is a container
>>> image
>>> > for some daemon. Can you provide the output of "ceph orch ls --format
>>> > yaml",
>>>
>>> https://pastebin.com/CStBf4J0
>>>
>>> > "ceph orch upgrade status",
>>> root@s-26-9-19-mon-m1:~# ceph orch upgrade status
>>> {
>>>     "target_image": null,
>>>     "in_progress": false,
>>>     "services_complete": [],
>>>     "progress": null,
>>>     "message": ""
>>> }
>>>
>>>
>>> > "ceph config get mgr container_image",
>>> root@s-26-9-19-mon-m1:~# ceph config get mgr container_image
>>>
>>> quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728
>>>
>>>
>>> > and the values for monitoring stack container images (format is "ceph
>>> > config get mgr mgr/cephadm/container_image_<daemon-type>" where daemon
>>> type
>>> > is one of "prometheus", "node_exporter", "alertmanager", "grafana",
>>> > "haproxy", "keepalived").
>>> quay.io/prometheus/prometheus:v2.18.1
>>> quay.io/prometheus/node-exporter:v0.18.1
>>> quay.io/prometheus/alertmanager:v0.20.0
>>> quay.io/ceph/ceph-grafana:6.7.4
>>> docker.io/library/haproxy:2.3
>>> docker.io/arcts/keepalived
>>>
>>> >
>>> > Thanks,
>>> >
>>> > - Adam King
>>>
>>> Thanks a lot!
>>>
>>> WBR,
>>>     Fyodor.
>>>
>>> >
>>> > On Thu, Jan 27, 2022 at 9:10 AM Fyodor Ustinov <ufm@xxxxxx> wrote:
>>> >
>>> >> Hi!
>>> >>
>>> >> I rebooted the nodes with mgr and now I see the following in the
>>> >> cephadm.log:
>>> >>
>>> >> As I understand it - cephadm is trying to execute some unsuccessful
>>> >> command of mine (I wonder which one), it does not succeed, but it keeps
>>> >> trying and trying. How do I stop it from trying?
>>> >>
>>> >> 2022-01-27 16:02:58,123 7fca7beca740 DEBUG
>>> >>
>>> --------------------------------------------------------------------------------
>>> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull']
>>> >> 2022-01-27 16:02:58,147 7fca7beca740 DEBUG /usr/bin/podman: 3.3.1
>>> >> 2022-01-27 16:02:58,249 7fca7beca740 INFO Pulling container image
>>> >> s-8-2-1:/dev/bcache0...
>>> >> 2022-01-27 16:02:58,278 7fca7beca740 DEBUG /usr/bin/podman: Error:
>>> invalid
>>> >> reference format
>>> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO Non-zero exit code 125 from
>>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0
>>> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO /usr/bin/podman: stderr Error:
>>> >> invalid reference format
>>> >> 2022-01-27 16:02:58,279 7fca7beca740 ERROR ERROR: Failed command:
>>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0
>>> >> 2022-01-27 16:03:58,420 7f897a7a6740 DEBUG
>>> >>
>>> --------------------------------------------------------------------------------
>>> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull']
>>> >> 2022-01-27 16:03:58,443 7f897a7a6740 DEBUG /usr/bin/podman: 3.3.1
>>> >> 2022-01-27 16:03:58,547 7f897a7a6740 INFO Pulling container image
>>> >> s-8-2-1:/dev/bcache0...
>>> >> 2022-01-27 16:03:58,575 7f897a7a6740 DEBUG /usr/bin/podman: Error:
>>> invalid
>>> >> reference format
>>> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO Non-zero exit code 125 from
>>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0
>>> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO /usr/bin/podman: stderr Error:
>>> >> invalid reference format
>>> >> 2022-01-27 16:03:58,577 7f897a7a6740 ERROR ERROR: Failed command:
>>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0
>>> >>
>>> >> WBR,
>>> >>     Fyodor.
>>> >> _______________________________________________
>>> >> ceph-users mailing list -- ceph-users@xxxxxxx
>>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> >>
>>>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux