Glad it's working. Honestly, no idea how that happened, never seen it before. Let me know if you ever find out what command caused it. - Adam King On Tue, Feb 1, 2022 at 11:29 AM Fyodor Ustinov <ufm@xxxxxx> wrote: > Hi! > > Adam! Big thanx! > > "ceph config rm osd.91 container_image" completly solve this trouble. > I don't understand why this happened, but at least now everything works. > > Thank you so much again! > > > ----- Original Message ----- > > From: "Fyodor Ustinov" <ufm@xxxxxx> > > To: "Adam King" <adking@xxxxxxxxxx> > > Cc: "ceph-users" <ceph-users@xxxxxxx> > > Sent: Tuesday, 1 February, 2022 18:12:16 > > Subject: Re: cephadm trouble > > > Hi! > > YES! HERE IT IS! > > > > global basic container_image > > > quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728 > > * > > osd.91 basic container_image > > s-8-2-1:/dev/bcache0 > > > > Two questions: > > 1. How did it get there > > 2. How to delete it - as far as I understand this field is not editable? > > > > > > ----- Original Message ----- > >> From: "Adam King" <adking@xxxxxxxxxx> > >> To: "Fyodor Ustinov" <ufm@xxxxxx> > >> Cc: "ceph-users" <ceph-users@xxxxxxx> > >> Sent: Tuesday, 1 February, 2022 17:45:13 > >> Subject: Re: Re: cephadm trouble > > > >> As a follow up to my previous comment, could you also post "ceph config > >> dump | grep container_image". It's related to the repo digest thing and > >> it's another way we could maybe discover where "s-8-2-1:/dev/bcache0" is > >> set as an image. > >> > >> - Adam King > >> > >> On Tue, Feb 1, 2022 at 8:52 AM Adam King <adking@xxxxxxxxxx> wrote: > >> > >>> Hi Fyodor, > >>> > >>> Honestly I'm super confused by your case. Daemon add osd is meant to > be a > >>> one time synchronous command so the idea that that is causing this > repeated > >>> pull in this fashion is super odd. I think I would need some sort of > list > >>> of commands run on this cluster or some type of reproducer. As > mentioned > >>> before, cephadm definitely thinks "s-8-2-1:/dev/bcache0" is the name > of a > >>> container image but I can't think of where that is set as I didn't see > it > >>> in any of the posted service specs or the config options for the any > of the > >>> images but it clearly must be set somewhere or we wouldn't be trying to > >>> pull that repeatedly. Never seen an issue like this before. This is a > total > >>> long shot, but you could trying setting "ceph config set mgr > >>> mgr/cephadm/use_repo_digest false" and see if it at least lets you > refresh > >>> the daemons and make progress (or at least gets us different things in > the > >>> logs). > >>> > >>> Sorry for not being too helpful, > >>> > >>> - Adam King > >>> > >>> On Tue, Feb 1, 2022 at 3:27 AM Fyodor Ustinov <ufm@xxxxxx> wrote: > >>> > >>>> Hi! > >>>> > >>>> No mode ideas? :( > >>>> > >>>> > >>>> ----- Original Message ----- > >>>> > From: "Fyodor Ustinov" <ufm@xxxxxx> > >>>> > To: "Adam King" <adking@xxxxxxxxxx> > >>>> > Cc: "ceph-users" <ceph-users@xxxxxxx> > >>>> > Sent: Friday, 28 January, 2022 23:02:26 > >>>> > Subject: Re: cephadm trouble > >>>> > >>>> > Hi! > >>>> > > >>>> >> Hmm, I'm not seeing anything that could be a cause in any of that > >>>> output. I > >>>> >> did notice, however, from your "ceph orch ls" output that none of > your > >>>> >> services have been refreshed since the 24th. Cephadm typically > tries to > >>>> >> refresh these things every 10 minutes so that signals something is > >>>> quite > >>>> >> wrong. > >>>> > From what I see in /var/log/ceph/cephadm.log it tries to run the > same > >>>> command > >>>> > once a minute and does nothing else. That's why the status has not > been > >>>> updated > >>>> > for 5 days. > >>>> > > >>>> >> Could you try running "ceph mgr fail" and if nothing seems to be > >>>> >> resolved could you post "ceph log last 200 debug cephadm". Maybe we > >>>> can see > >>>> >> if something gets stuck again after the mgr restarts. > >>>> > "ceph mgr fail" did not help. > >>>> > "ceph log last 200 debug cephadm" show again and again and again: > >>>> > > >>>> > 2022-01-28T20:57:12.792090+0000 mgr.s-26-9-24-mon-m2.nhltmq > >>>> (mgr.129738166) 349 > >>>> > : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling > >>>> container > >>>> > image s-8-2-1:/dev/bcache0... > >>>> > Non-zero exit code 125 from /usr/bin/podman pull > s-8-2-1:/dev/bcache0 > >>>> > /usr/bin/podman: stderr Error: invalid reference format > >>>> > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 > >>>> > Traceback (most recent call last): > >>>> > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in > >>>> _remote_connection > >>>> > yield (conn, connr) > >>>> > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in > _run_cephadm > >>>> > code, '\n'.join(err))) > >>>> > orchestrator._interface.OrchestratorError: cephadm exited with an > error > >>>> code: 1, > >>>> > stderr:Pulling container image s-8-2-1:/dev/bcache0... > >>>> > Non-zero exit code 125 from /usr/bin/podman pull > s-8-2-1:/dev/bcache0 > >>>> > /usr/bin/podman: stderr Error: invalid reference format > >>>> > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 > >>>> > 2022-01-28T20:58:13.092996+0000 mgr.s-26-9-24-mon-m2.nhltmq > >>>> (mgr.129738166) 392 > >>>> > : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling > >>>> container > >>>> > image s-8-2-1:/dev/bcache0... > >>>> > Non-zero exit code 125 from /usr/bin/podman pull > s-8-2-1:/dev/bcache0 > >>>> > /usr/bin/podman: stderr Error: invalid reference format > >>>> > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 > >>>> > Traceback (most recent call last): > >>>> > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in > >>>> _remote_connection > >>>> > yield (conn, connr) > >>>> > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in > _run_cephadm > >>>> > code, '\n'.join(err))) > >>>> > orchestrator._interface.OrchestratorError: cephadm exited with an > error > >>>> code: 1, > >>>> > stderr:Pulling container image s-8-2-1:/dev/bcache0... > >>>> > Non-zero exit code 125 from /usr/bin/podman pull > s-8-2-1:/dev/bcache0 > >>>> > /usr/bin/podman: stderr Error: invalid reference format > >>>> > ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 > >>>> > > >>>> >> > >>>> >> Thanks, > >>>> >> > >>>> >> - Adam King > >>>> >> > >>>> >> On Thu, Jan 27, 2022 at 7:06 PM Fyodor Ustinov <ufm@xxxxxx> wrote: > >>>> >> > >>>> >>> Hi! > >>>> >>> > >>>> >>> I think this happened after I tried to recreate the osd with the > >>>> command > >>>> >>> "ceph orch daemon add osd s-8-2-1:/dev/bcache0" > >>>> >>> > >>>> >>> > >>>> >>> > It looks like cephadm believes "s-8-2-1:/dev/bcache0" is a > container > >>>> >>> image > >>>> >>> > for some daemon. Can you provide the output of "ceph orch ls > >>>> --format > >>>> >>> > yaml", > >>>> >>> > >>>> >>> https://pastebin.com/CStBf4J0 > >>>> >>> > >>>> >>> > "ceph orch upgrade status", > >>>> >>> root@s-26-9-19-mon-m1:~# ceph orch upgrade status > >>>> >>> { > >>>> >>> "target_image": null, > >>>> >>> "in_progress": false, > >>>> >>> "services_complete": [], > >>>> >>> "progress": null, > >>>> >>> "message": "" > >>>> >>> } > >>>> >>> > >>>> >>> > >>>> >>> > "ceph config get mgr container_image", > >>>> >>> root@s-26-9-19-mon-m1:~# ceph config get mgr container_image > >>>> >>> > >>>> >>> > >>>> > quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728 > >>>> >>> > >>>> >>> > >>>> >>> > and the values for monitoring stack container images (format is > >>>> "ceph > >>>> >>> > config get mgr mgr/cephadm/container_image_<daemon-type>" where > >>>> daemon > >>>> >>> type > >>>> >>> > is one of "prometheus", "node_exporter", "alertmanager", > "grafana", > >>>> >>> > "haproxy", "keepalived"). > >>>> >>> quay.io/prometheus/prometheus:v2.18.1 > >>>> >>> quay.io/prometheus/node-exporter:v0.18.1 > >>>> >>> quay.io/prometheus/alertmanager:v0.20.0 > >>>> >>> quay.io/ceph/ceph-grafana:6.7.4 > >>>> >>> docker.io/library/haproxy:2.3 > >>>> >>> docker.io/arcts/keepalived > >>>> >>> > >>>> >>> > > >>>> >>> > Thanks, > >>>> >>> > > >>>> >>> > - Adam King > >>>> >>> > >>>> >>> Thanks a lot! > >>>> >>> > >>>> >>> WBR, > >>>> >>> Fyodor. > >>>> >>> > >>>> >>> > > >>>> >>> > On Thu, Jan 27, 2022 at 9:10 AM Fyodor Ustinov <ufm@xxxxxx> > wrote: > >>>> >>> > > >>>> >>> >> Hi! > >>>> >>> >> > >>>> >>> >> I rebooted the nodes with mgr and now I see the following in > the > >>>> >>> >> cephadm.log: > >>>> >>> >> > >>>> >>> >> As I understand it - cephadm is trying to execute some > unsuccessful > >>>> >>> >> command of mine (I wonder which one), it does not succeed, but > it > >>>> keeps > >>>> >>> >> trying and trying. How do I stop it from trying? > >>>> >>> >> > >>>> >>> >> 2022-01-27 16:02:58,123 7fca7beca740 DEBUG > >>>> >>> >> > >>>> >>> > >>>> > -------------------------------------------------------------------------------- > >>>> >>> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull'] > >>>> >>> >> 2022-01-27 16:02:58,147 7fca7beca740 DEBUG /usr/bin/podman: > 3.3.1 > >>>> >>> >> 2022-01-27 16:02:58,249 7fca7beca740 INFO Pulling container > image > >>>> >>> >> s-8-2-1:/dev/bcache0... > >>>> >>> >> 2022-01-27 16:02:58,278 7fca7beca740 DEBUG /usr/bin/podman: > Error: > >>>> >>> invalid > >>>> >>> >> reference format > >>>> >>> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO Non-zero exit code > 125 > >>>> from > >>>> >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 > >>>> >>> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO /usr/bin/podman: > stderr > >>>> Error: > >>>> >>> >> invalid reference format > >>>> >>> >> 2022-01-27 16:02:58,279 7fca7beca740 ERROR ERROR: Failed > command: > >>>> >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 > >>>> >>> >> 2022-01-27 16:03:58,420 7f897a7a6740 DEBUG > >>>> >>> >> > >>>> >>> > >>>> > -------------------------------------------------------------------------------- > >>>> >>> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull'] > >>>> >>> >> 2022-01-27 16:03:58,443 7f897a7a6740 DEBUG /usr/bin/podman: > 3.3.1 > >>>> >>> >> 2022-01-27 16:03:58,547 7f897a7a6740 INFO Pulling container > image > >>>> >>> >> s-8-2-1:/dev/bcache0... > >>>> >>> >> 2022-01-27 16:03:58,575 7f897a7a6740 DEBUG /usr/bin/podman: > Error: > >>>> >>> invalid > >>>> >>> >> reference format > >>>> >>> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO Non-zero exit code > 125 > >>>> from > >>>> >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 > >>>> >>> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO /usr/bin/podman: > stderr > >>>> Error: > >>>> >>> >> invalid reference format > >>>> >>> >> 2022-01-27 16:03:58,577 7f897a7a6740 ERROR ERROR: Failed > command: > >>>> >>> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 > >>>> >>> >> > >>>> >>> >> WBR, > >>>> >>> >> Fyodor. > >>>> >>> >> _______________________________________________ > >>>> >>> >> ceph-users mailing list -- ceph-users@xxxxxxx > >>>> >>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>>> >>> >> > >>>> >>> > >>>> > _______________________________________________ > >>>> > ceph-users mailing list -- ceph-users@xxxxxxx > >>>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>>> > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx