Hi! Adam! Big thanx! "ceph config rm osd.91 container_image" completly solve this trouble. I don't understand why this happened, but at least now everything works. Thank you so much again! ----- Original Message ----- > From: "Fyodor Ustinov" <ufm@xxxxxx> > To: "Adam King" <adking@xxxxxxxxxx> > Cc: "ceph-users" <ceph-users@xxxxxxx> > Sent: Tuesday, 1 February, 2022 18:12:16 > Subject: Re: cephadm trouble > Hi! > YES! HERE IT IS! > > global basic container_image > quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728 > * > osd.91 basic container_image > s-8-2-1:/dev/bcache0 > > Two questions: > 1. How did it get there > 2. How to delete it - as far as I understand this field is not editable? > > > ----- Original Message ----- >> From: "Adam King" <adking@xxxxxxxxxx> >> To: "Fyodor Ustinov" <ufm@xxxxxx> >> Cc: "ceph-users" <ceph-users@xxxxxxx> >> Sent: Tuesday, 1 February, 2022 17:45:13 >> Subject: Re: Re: cephadm trouble > >> As a follow up to my previous comment, could you also post "ceph config >> dump | grep container_image". It's related to the repo digest thing and >> it's another way we could maybe discover where "s-8-2-1:/dev/bcache0" is >> set as an image. >> >> - Adam King >> >> On Tue, Feb 1, 2022 at 8:52 AM Adam King <adking@xxxxxxxxxx> wrote: >> >>> Hi Fyodor, >>> >>> Honestly I'm super confused by your case. Daemon add osd is meant to be a >>> one time synchronous command so the idea that that is causing this repeated >>> pull in this fashion is super odd. I think I would need some sort of list >>> of commands run on this cluster or some type of reproducer. As mentioned >>> before, cephadm definitely thinks "s-8-2-1:/dev/bcache0" is the name of a >>> container image but I can't think of where that is set as I didn't see it >>> in any of the posted service specs or the config options for the any of the >>> images but it clearly must be set somewhere or we wouldn't be trying to >>> pull that repeatedly. Never seen an issue like this before. This is a total >>> long shot, but you could trying setting "ceph config set mgr >>> mgr/cephadm/use_repo_digest false" and see if it at least lets you refresh >>> the daemons and make progress (or at least gets us different things in the >>> logs). >>> >>> Sorry for not being too helpful, >>> >>> - Adam King >>> >>> On Tue, Feb 1, 2022 at 3:27 AM Fyodor Ustinov <ufm@xxxxxx> wrote: >>> >>>> Hi! >>>> >>>> No mode ideas? :( >>>> >>>> >>>> ----- Original Message ----- >>>> > From: "Fyodor Ustinov" <ufm@xxxxxx> >>>> > To: "Adam King" <adking@xxxxxxxxxx> >>>> > Cc: "ceph-users" <ceph-users@xxxxxxx> >>>> > Sent: Friday, 28 January, 2022 23:02:26 >>>> > Subject: Re: cephadm trouble >>>> >>>> > Hi! >>>> > >>>> >> Hmm, I'm not seeing anything that could be a cause in any of that >>>> output. I >>>> >> did notice, however, from your "ceph orch ls" output that none of your >>>> >> services have been refreshed since the 24th. Cephadm typically tries to >>>> >> refresh these things every 10 minutes so that signals something is >>>> quite >>>> >> wrong. >>>> > From what I see in /var/log/ceph/cephadm.log it tries to run the same >>>> command >>>> > once a minute and does nothing else. That's why the status has not been >>>> updated >>>> > for 5 days. >>>> > >>>> >> Could you try running "ceph mgr fail" and if nothing seems to be >>>> >> resolved could you post "ceph log last 200 debug cephadm". Maybe we >>>> can see >>>> >> if something gets stuck again after the mgr restarts. >>>> > "ceph mgr fail" did not help. >>>> > "ceph log last 200 debug cephadm" show again and again and again: >>>> > >>>> > 2022-01-28T20:57:12.792090+0000 mgr.s-26-9-24-mon-m2.nhltmq >>>> (mgr.129738166) 349 >>>> > : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling >>>> container >>>> > image s-8-2-1:/dev/bcache0... >>>> > Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>>> > /usr/bin/podman: stderr Error: invalid reference format >>>> > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>>> > Traceback (most recent call last): >>>> > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in >>>> _remote_connection >>>> > yield (conn, connr) >>>> > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm >>>> > code, '\n'.join(err))) >>>> > orchestrator._interface.OrchestratorError: cephadm exited with an error >>>> code: 1, >>>> > stderr:Pulling container image s-8-2-1:/dev/bcache0... >>>> > Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>>> > /usr/bin/podman: stderr Error: invalid reference format >>>> > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>>> > 2022-01-28T20:58:13.092996+0000 mgr.s-26-9-24-mon-m2.nhltmq >>>> (mgr.129738166) 392 >>>> > : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling >>>> container >>>> > image s-8-2-1:/dev/bcache0... >>>> > Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>>> > /usr/bin/podman: stderr Error: invalid reference format >>>> > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>>> > Traceback (most recent call last): >>>> > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in >>>> _remote_connection >>>> > yield (conn, connr) >>>> > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm >>>> > code, '\n'.join(err))) >>>> > orchestrator._interface.OrchestratorError: cephadm exited with an error >>>> code: 1, >>>> > stderr:Pulling container image s-8-2-1:/dev/bcache0... >>>> > Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>>> > /usr/bin/podman: stderr Error: invalid reference format >>>> > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>>> > >>>> >> >>>> >> Thanks, >>>> >> >>>> >> - Adam King >>>> >> >>>> >> On Thu, Jan 27, 2022 at 7:06 PM Fyodor Ustinov <ufm@xxxxxx> wrote: >>>> >> >>>> >>> Hi! >>>> >>> >>>> >>> I think this happened after I tried to recreate the osd with the >>>> command >>>> >>> "ceph orch daemon add osd s-8-2-1:/dev/bcache0" >>>> >>> >>>> >>> >>>> >>> > It looks like cephadm believes "s-8-2-1:/dev/bcache0" is a container >>>> >>> image >>>> >>> > for some daemon. Can you provide the output of "ceph orch ls >>>> --format >>>> >>> > yaml", >>>> >>> >>>> >>> https://pastebin.com/CStBf4J0 >>>> >>> >>>> >>> > "ceph orch upgrade status", >>>> >>> root@s-26-9-19-mon-m1:~# ceph orch upgrade status >>>> >>> { >>>> >>> "target_image": null, >>>> >>> "in_progress": false, >>>> >>> "services_complete": [], >>>> >>> "progress": null, >>>> >>> "message": "" >>>> >>> } >>>> >>> >>>> >>> >>>> >>> > "ceph config get mgr container_image", >>>> >>> root@s-26-9-19-mon-m1:~# ceph config get mgr container_image >>>> >>> >>>> >>> >>>> quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728 >>>> >>> >>>> >>> >>>> >>> > and the values for monitoring stack container images (format is >>>> "ceph >>>> >>> > config get mgr mgr/cephadm/container_image_<daemon-type>" where >>>> daemon >>>> >>> type >>>> >>> > is one of "prometheus", "node_exporter", "alertmanager", "grafana", >>>> >>> > "haproxy", "keepalived"). >>>> >>> quay.io/prometheus/prometheus:v2.18.1 >>>> >>> quay.io/prometheus/node-exporter:v0.18.1 >>>> >>> quay.io/prometheus/alertmanager:v0.20.0 >>>> >>> quay.io/ceph/ceph-grafana:6.7.4 >>>> >>> docker.io/library/haproxy:2.3 >>>> >>> docker.io/arcts/keepalived >>>> >>> >>>> >>> > >>>> >>> > Thanks, >>>> >>> > >>>> >>> > - Adam King >>>> >>> >>>> >>> Thanks a lot! >>>> >>> >>>> >>> WBR, >>>> >>> Fyodor. >>>> >>> >>>> >>> > >>>> >>> > On Thu, Jan 27, 2022 at 9:10 AM Fyodor Ustinov <ufm@xxxxxx> wrote: >>>> >>> > >>>> >>> >> Hi! >>>> >>> >> >>>> >>> >> I rebooted the nodes with mgr and now I see the following in the >>>> >>> >> cephadm.log: >>>> >>> >> >>>> >>> >> As I understand it - cephadm is trying to execute some unsuccessful >>>> >>> >> command of mine (I wonder which one), it does not succeed, but it >>>> keeps >>>> >>> >> trying and trying. How do I stop it from trying? >>>> >>> >> >>>> >>> >> 2022-01-27 16:02:58,123 7fca7beca740 DEBUG >>>> >>> >> >>>> >>> >>>> -------------------------------------------------------------------------------- >>>> >>> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull'] >>>> >>> >> 2022-01-27 16:02:58,147 7fca7beca740 DEBUG /usr/bin/podman: 3.3.1 >>>> >>> >> 2022-01-27 16:02:58,249 7fca7beca740 INFO Pulling container image >>>> >>> >> s-8-2-1:/dev/bcache0... >>>> >>> >> 2022-01-27 16:02:58,278 7fca7beca740 DEBUG /usr/bin/podman: Error: >>>> >>> invalid >>>> >>> >> reference format >>>> >>> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO Non-zero exit code 125 >>>> from >>>> >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>>> >>> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO /usr/bin/podman: stderr >>>> Error: >>>> >>> >> invalid reference format >>>> >>> >> 2022-01-27 16:02:58,279 7fca7beca740 ERROR ERROR: Failed command: >>>> >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>>> >>> >> 2022-01-27 16:03:58,420 7f897a7a6740 DEBUG >>>> >>> >> >>>> >>> >>>> -------------------------------------------------------------------------------- >>>> >>> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull'] >>>> >>> >> 2022-01-27 16:03:58,443 7f897a7a6740 DEBUG /usr/bin/podman: 3.3.1 >>>> >>> >> 2022-01-27 16:03:58,547 7f897a7a6740 INFO Pulling container image >>>> >>> >> s-8-2-1:/dev/bcache0... >>>> >>> >> 2022-01-27 16:03:58,575 7f897a7a6740 DEBUG /usr/bin/podman: Error: >>>> >>> invalid >>>> >>> >> reference format >>>> >>> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO Non-zero exit code 125 >>>> from >>>> >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>>> >>> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO /usr/bin/podman: stderr >>>> Error: >>>> >>> >> invalid reference format >>>> >>> >> 2022-01-27 16:03:58,577 7f897a7a6740 ERROR ERROR: Failed command: >>>> >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>>> >>> >> >>>> >>> >> WBR, >>>> >>> >> Fyodor. >>>> >>> >> _______________________________________________ >>>> >>> >> ceph-users mailing list -- ceph-users@xxxxxxx >>>> >>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> >>> >> >>>> >>> >>>> > _______________________________________________ >>>> > ceph-users mailing list -- ceph-users@xxxxxxx >>>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx