Hi! YES! HERE IT IS! global basic container_image quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728 * osd.91 basic container_image s-8-2-1:/dev/bcache0 Two questions: 1. How did it get there 2. How to delete it - as far as I understand this field is not editable? ----- Original Message ----- > From: "Adam King" <adking@xxxxxxxxxx> > To: "Fyodor Ustinov" <ufm@xxxxxx> > Cc: "ceph-users" <ceph-users@xxxxxxx> > Sent: Tuesday, 1 February, 2022 17:45:13 > Subject: Re: Re: cephadm trouble > As a follow up to my previous comment, could you also post "ceph config > dump | grep container_image". It's related to the repo digest thing and > it's another way we could maybe discover where "s-8-2-1:/dev/bcache0" is > set as an image. > > - Adam King > > On Tue, Feb 1, 2022 at 8:52 AM Adam King <adking@xxxxxxxxxx> wrote: > >> Hi Fyodor, >> >> Honestly I'm super confused by your case. Daemon add osd is meant to be a >> one time synchronous command so the idea that that is causing this repeated >> pull in this fashion is super odd. I think I would need some sort of list >> of commands run on this cluster or some type of reproducer. As mentioned >> before, cephadm definitely thinks "s-8-2-1:/dev/bcache0" is the name of a >> container image but I can't think of where that is set as I didn't see it >> in any of the posted service specs or the config options for the any of the >> images but it clearly must be set somewhere or we wouldn't be trying to >> pull that repeatedly. Never seen an issue like this before. This is a total >> long shot, but you could trying setting "ceph config set mgr >> mgr/cephadm/use_repo_digest false" and see if it at least lets you refresh >> the daemons and make progress (or at least gets us different things in the >> logs). >> >> Sorry for not being too helpful, >> >> - Adam King >> >> On Tue, Feb 1, 2022 at 3:27 AM Fyodor Ustinov <ufm@xxxxxx> wrote: >> >>> Hi! >>> >>> No mode ideas? :( >>> >>> >>> ----- Original Message ----- >>> > From: "Fyodor Ustinov" <ufm@xxxxxx> >>> > To: "Adam King" <adking@xxxxxxxxxx> >>> > Cc: "ceph-users" <ceph-users@xxxxxxx> >>> > Sent: Friday, 28 January, 2022 23:02:26 >>> > Subject: Re: cephadm trouble >>> >>> > Hi! >>> > >>> >> Hmm, I'm not seeing anything that could be a cause in any of that >>> output. I >>> >> did notice, however, from your "ceph orch ls" output that none of your >>> >> services have been refreshed since the 24th. Cephadm typically tries to >>> >> refresh these things every 10 minutes so that signals something is >>> quite >>> >> wrong. >>> > From what I see in /var/log/ceph/cephadm.log it tries to run the same >>> command >>> > once a minute and does nothing else. That's why the status has not been >>> updated >>> > for 5 days. >>> > >>> >> Could you try running "ceph mgr fail" and if nothing seems to be >>> >> resolved could you post "ceph log last 200 debug cephadm". Maybe we >>> can see >>> >> if something gets stuck again after the mgr restarts. >>> > "ceph mgr fail" did not help. >>> > "ceph log last 200 debug cephadm" show again and again and again: >>> > >>> > 2022-01-28T20:57:12.792090+0000 mgr.s-26-9-24-mon-m2.nhltmq >>> (mgr.129738166) 349 >>> > : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling >>> container >>> > image s-8-2-1:/dev/bcache0... >>> > Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> > /usr/bin/podman: stderr Error: invalid reference format >>> > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> > Traceback (most recent call last): >>> > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in >>> _remote_connection >>> > yield (conn, connr) >>> > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm >>> > code, '\n'.join(err))) >>> > orchestrator._interface.OrchestratorError: cephadm exited with an error >>> code: 1, >>> > stderr:Pulling container image s-8-2-1:/dev/bcache0... >>> > Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> > /usr/bin/podman: stderr Error: invalid reference format >>> > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> > 2022-01-28T20:58:13.092996+0000 mgr.s-26-9-24-mon-m2.nhltmq >>> (mgr.129738166) 392 >>> > : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling >>> container >>> > image s-8-2-1:/dev/bcache0... >>> > Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> > /usr/bin/podman: stderr Error: invalid reference format >>> > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> > Traceback (most recent call last): >>> > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in >>> _remote_connection >>> > yield (conn, connr) >>> > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm >>> > code, '\n'.join(err))) >>> > orchestrator._interface.OrchestratorError: cephadm exited with an error >>> code: 1, >>> > stderr:Pulling container image s-8-2-1:/dev/bcache0... >>> > Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> > /usr/bin/podman: stderr Error: invalid reference format >>> > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> > >>> >> >>> >> Thanks, >>> >> >>> >> - Adam King >>> >> >>> >> On Thu, Jan 27, 2022 at 7:06 PM Fyodor Ustinov <ufm@xxxxxx> wrote: >>> >> >>> >>> Hi! >>> >>> >>> >>> I think this happened after I tried to recreate the osd with the >>> command >>> >>> "ceph orch daemon add osd s-8-2-1:/dev/bcache0" >>> >>> >>> >>> >>> >>> > It looks like cephadm believes "s-8-2-1:/dev/bcache0" is a container >>> >>> image >>> >>> > for some daemon. Can you provide the output of "ceph orch ls >>> --format >>> >>> > yaml", >>> >>> >>> >>> https://pastebin.com/CStBf4J0 >>> >>> >>> >>> > "ceph orch upgrade status", >>> >>> root@s-26-9-19-mon-m1:~# ceph orch upgrade status >>> >>> { >>> >>> "target_image": null, >>> >>> "in_progress": false, >>> >>> "services_complete": [], >>> >>> "progress": null, >>> >>> "message": "" >>> >>> } >>> >>> >>> >>> >>> >>> > "ceph config get mgr container_image", >>> >>> root@s-26-9-19-mon-m1:~# ceph config get mgr container_image >>> >>> >>> >>> >>> quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728 >>> >>> >>> >>> >>> >>> > and the values for monitoring stack container images (format is >>> "ceph >>> >>> > config get mgr mgr/cephadm/container_image_<daemon-type>" where >>> daemon >>> >>> type >>> >>> > is one of "prometheus", "node_exporter", "alertmanager", "grafana", >>> >>> > "haproxy", "keepalived"). >>> >>> quay.io/prometheus/prometheus:v2.18.1 >>> >>> quay.io/prometheus/node-exporter:v0.18.1 >>> >>> quay.io/prometheus/alertmanager:v0.20.0 >>> >>> quay.io/ceph/ceph-grafana:6.7.4 >>> >>> docker.io/library/haproxy:2.3 >>> >>> docker.io/arcts/keepalived >>> >>> >>> >>> > >>> >>> > Thanks, >>> >>> > >>> >>> > - Adam King >>> >>> >>> >>> Thanks a lot! >>> >>> >>> >>> WBR, >>> >>> Fyodor. >>> >>> >>> >>> > >>> >>> > On Thu, Jan 27, 2022 at 9:10 AM Fyodor Ustinov <ufm@xxxxxx> wrote: >>> >>> > >>> >>> >> Hi! >>> >>> >> >>> >>> >> I rebooted the nodes with mgr and now I see the following in the >>> >>> >> cephadm.log: >>> >>> >> >>> >>> >> As I understand it - cephadm is trying to execute some unsuccessful >>> >>> >> command of mine (I wonder which one), it does not succeed, but it >>> keeps >>> >>> >> trying and trying. How do I stop it from trying? >>> >>> >> >>> >>> >> 2022-01-27 16:02:58,123 7fca7beca740 DEBUG >>> >>> >> >>> >>> >>> -------------------------------------------------------------------------------- >>> >>> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull'] >>> >>> >> 2022-01-27 16:02:58,147 7fca7beca740 DEBUG /usr/bin/podman: 3.3.1 >>> >>> >> 2022-01-27 16:02:58,249 7fca7beca740 INFO Pulling container image >>> >>> >> s-8-2-1:/dev/bcache0... >>> >>> >> 2022-01-27 16:02:58,278 7fca7beca740 DEBUG /usr/bin/podman: Error: >>> >>> invalid >>> >>> >> reference format >>> >>> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO Non-zero exit code 125 >>> from >>> >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> >>> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO /usr/bin/podman: stderr >>> Error: >>> >>> >> invalid reference format >>> >>> >> 2022-01-27 16:02:58,279 7fca7beca740 ERROR ERROR: Failed command: >>> >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> >>> >> 2022-01-27 16:03:58,420 7f897a7a6740 DEBUG >>> >>> >> >>> >>> >>> -------------------------------------------------------------------------------- >>> >>> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull'] >>> >>> >> 2022-01-27 16:03:58,443 7f897a7a6740 DEBUG /usr/bin/podman: 3.3.1 >>> >>> >> 2022-01-27 16:03:58,547 7f897a7a6740 INFO Pulling container image >>> >>> >> s-8-2-1:/dev/bcache0... >>> >>> >> 2022-01-27 16:03:58,575 7f897a7a6740 DEBUG /usr/bin/podman: Error: >>> >>> invalid >>> >>> >> reference format >>> >>> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO Non-zero exit code 125 >>> from >>> >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> >>> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO /usr/bin/podman: stderr >>> Error: >>> >>> >> invalid reference format >>> >>> >> 2022-01-27 16:03:58,577 7f897a7a6740 ERROR ERROR: Failed command: >>> >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 >>> >>> >> >>> >>> >> WBR, >>> >>> >> Fyodor. >>> >>> >> _______________________________________________ >>> >>> >> ceph-users mailing list -- ceph-users@xxxxxxx >>> >>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>> >>> >> >>> >>> >>> > _______________________________________________ >>> > ceph-users mailing list -- ceph-users@xxxxxxx >>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx