Re: Ceph Orchestrator (cephadm) stopped doing something

Redouane Kachach Elhichou <rkachach@xxxxxxxxxx> · Mon, 5 Dec 2022 12:33:20 +0100

Sometimes some ceph-volume commands hang when trying to access some device.
Please, take a look at the solution/steps provided by Adam in the thread
with title "Issue adding host with cephadm - nothing is deployed" to check
if the cephadm is waiting for some ceph-volume command to complete.

Regards,
Redo.

On Tue, Nov 29, 2022 at 8:55 AM Volker Racho <rgsw4000@xxxxxxxxx> wrote:

> Hi,
>
> ceph orch commands are not executed anymore in my cephadm-managed cluster
> (17.2.3) and I don't see why. Cluster is healthy and overall working,
> except for the orchestrator part.
>
> For instance, when I run `ceph orch redeploy ingress.rgw.default`, I see
> the command in audit logs, cephadm also logs the command and
> "_kick_serve_loop" and that's it. No more messages or errors (also not in
> logs with debug level: ceph config set mgr mgr/cephadm/log_to_cluster_level
> debug; ceph -W cephadm --watch-debug) but it never redeploys the service.
>
> Nov 21 07:54:45 ceph-0.yy.xxxx.net bash[1262]: debug
> 2022-11-21T07:54:45.397+0000 7f7b6b527700  0 log_channel(audit) log [DBG] :
> from='client.38766115 -' entity='client.admin' cmd=[{"prefix": "orch",
> "action": "redeploy", "service_nam
> Nov 21 07:54:45 ceph-0.yy.xxxx.net bash[1262]: debug
> 2022-11-21T07:54:45.401+0000 7f7b6bd28700  0 [cephadm INFO root] Redeploy
> service ingress.rgw.default
> Nov 21 07:54:45 ceph-0.yy.xxxx.net bash[1262]: debug
> 2022-11-21T07:54:45.401+0000 7f7b6bd28700  0 log_channel(cephadm) log [INF]
> : Redeploy service ingress.rgw.default
> Nov 21 07:54:45 ceph-0.yy.xxxx.net bash[1262]: debug
> 2022-11-21T07:54:45.401+0000 7f7b6bd28700  0 log_channel(cephadm) log [DBG]
> : _kick_serve_loop
> Nov 21 07:54:45 ceph-0.yy.xxxx.net bash[1262]: debug
> 2022-11-21T07:54:45.401+0000 7f7b6bd28700  0 log_channel(cephadm) log [DBG]
> : _kick_serve_loop
>
> Same behaviour for many other ceph orch ... command including ceph orch
> upgrade.
>
> # ceph orch status
> Backend: cephadm
> Available: Yes
> Paused: No
>
> According to status, orchestrator is available and not paused. I have tried
> to set the backend to "" and reset to "cephadm", I paused and resumed the
> orchestrator, cleared progress entries and such but nothing could make the
> cluster execute the commands. SSH connections between hosts are working.
>
> Any ideas how to fix or even debug this? I am a bit lost on this.
>
> Regards, SW.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx