Re: ceph orch command not working anymore on squid (19.2.1)

Eugen Block <eblock@xxxxxx> · Thu, 13 Mar 2025 14:51:26 +0000

I think it's worth a shot cleaning up those Grafana config-keys.  
Grafana isn't really critical, so you can try to just remove those  
keys and try another mgr fail.

Zitat von Moritz Baumann <mo@xxxxxxxxxxxxx>:

ceph01[0]:~# ceph -s
  cluster:
    id:     4cfb1138-d2a5-11ef-87eb-1c34da408e52
    health: HEALTH_OK

  services:
    mon: 5 daemons, quorum ceph01,ceph02,ceph03,ceph04,ceph05 (age 4d)
    mgr: ceph01.ombogc(active, since 7m), standbys: ceph02.nqlpce
    mds: 1/1 daemons up, 3 standby, 1 hot standby
    osd: 30 osds: 30 up (since 4d), 30 in (since 8w)
    rgw: 2 daemons active (2 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   11 pools, 1393 pgs
    objects: 2.72M objects, 738 GiB
    usage:   2.2 TiB used, 417 TiB / 419 TiB avail
    pgs:     1393 active+clean

  io:
    client:   2.6 KiB/s rd, 404 KiB/s wr, 3 op/s rd, 21 op/s wr

ceph01[0]:~# ceph health detail
HEALTH_OK

These queries yielded nothing suspicious:
 525  [2025-03-12-22:00:07] ceph config-key get mgr/cephadm/inventory
  526  [2025-03-12-22:01:00] ceph config-key get mgr/cephadm/inventory | jq
  527  [2025-03-12-22:02:59] ceph config-key get  
mgr/cephadm/host.ceph01.inf.ethz.ch
  528  [2025-03-12-22:03:03] ceph config-key get  
mgr/cephadm/host.ceph01.inf.ethz.ch | jq
  529  [2025-03-12-22:03:58] ceph config-key get  
mgr/cephadm/host.ceph01.inf.ethz.ch | jq | grep MethodAccessor
  530  [2025-03-12-22:04:03] ceph config-key get  
mgr/cephadm/host.ceph02.inf.ethz.ch | jq | grep MethodAccessor
  531  [2025-03-12-22:04:06] ceph config-key get  
mgr/cephadm/host.ceph03.inf.ethz.ch | jq | grep MethodAccessor
  532  [2025-03-12-22:04:10] ceph config-key get  
mgr/cephadm/host.ceph04.inf.ethz.ch | jq | grep MethodAccessor
  533  [2025-03-12-22:04:14] ceph config-key get  
mgr/cephadm/host.ceph05.inf.ethz.ch | jq | grep MethodAccessor
  534  [2025-03-12-22:04:19] ceph config-key get  
mgr/cephadm/host.ceph01.inf.ethz.ch | jq
  535  [2025-03-12-22:04:23] ceph config-key get  
mgr/cephadm/host.ceph01.inf.ethz.ch | jq  | less
  578  [2025-03-13-15:03:19] ceph config-key get  
mgr/cephadm/host.ceph02.inf.ethz.ch | jq | less
  579  [2025-03-13-15:04:08] ceph config-key get  
mgr/cephadm/host.ceph02.inf.ethz.ch | jq | grep hostname
  580  [2025-03-13-15:04:15] ceph config-key get  
mgr/cephadm/host.ceph01.inf.ethz.ch | jq | grep hostname
  581  [2025-03-13-15:04:21] ceph config-key get  
mgr/cephadm/host.ceph03.inf.ethz.ch | jq | grep hostname
  582  [2025-03-13-15:04:26] ceph config-key get  
mgr/cephadm/host.ceph04.inf.ethz.ch | jq | grep hostname
  583  [2025-03-13-15:04:31] ceph config-key get  
mgr/cephadm/host.ceph05.inf.ethz.ch | jq | grep hostname
  584  [2025-03-13-15:04:39] ceph config-key get  
mgr/cephadm/host.ceph03.inf.ethz.ch | jq | less
  585  [2025-03-13-15:18:33] ceph config-key get  
mgr/cephadm/host.ceph04.inf.ethz.ch | jq | less
  586  [2025-03-13-15:19:34] ceph config-key get  
mgr/cephadm/host.ceph05.inf.ethz.ch | jq | less

But when i looked into my bash history I saw (because I got grafana  
cert errors and there were postings saying this was the way to get a  
new self signed cert).

  196  [2025-01-30-10:56:08] ceph config-key set  
mgr/cephadm/cert_store.cert.grafana_cert -i grafana_certs.json
  197  [2025-01-30-10:56:32] ceph config-key get  
mgr/cephadm/cert_store.cert.grafana_cert
  199  [2025-01-30-10:57:26] ceph config-key set  
mgr/cephadm/cert_store.key.grafana_key -i grafana_keys.json
  200  [2025-01-30-10:57:33] ceph config-key get  
mgr/cephadm/cert_store.key.grafana_key
  225  [2025-02-11-11:09:42] ceph config-key set  
mgr/cephadm/ceph01.inf.ethz.ch/grafana_key -i
  226  [2025-02-11-11:10:03] ceph config-key set  
mgr/cephadm/ceph01.inf.ethz.ch/grafana_key
  227  [2025-02-11-11:10:30] ceph config-key set  
mgr/cephadm/ceph01.inf.ethz.ch/grafana_crt
  253  [2025-02-25-09:48:32] ceph config-key get  
mgr/cephadm/cert_store.key.grafana_key
  254  [2025-02-25-09:48:42] ceph config-key set  
mgr/cephadm/cert_store.key.grafana_key
  255  [2025-02-25-09:48:48] ceph config-key set  
mgr/cephadm/cert_store.key.grafana_crt

So maybe fixing this will fix orchestrator?

Am 12.03.2025 um 20:58 schrieb Adam King:
Regarding the

ValueError: "'xwork.MethodAccessor.denyMethodExecution'" does not  
appear to be an IPv4 or IPv6 address

can you check `ceph config-key get mgr/cephadm/inventory` and see  
if you see something related to that (such as  
"'xwork.MethodAccessor.denyMethodExecution'" being present as the  
addr for a host). If nothing there, there should also be a number  
of `mgr/ cephadm/host.<hostname>` entries, one of which may have  
something related as well. Cephadm stores all its state in  
mgr/cephadm/... entries in the config-key store, so if it keeps  
failing after mgr failovers there's likely something in there that  
shouldn't be, and your only recourse at this point might be to  
manually edit one of those entries before failing over the mgr  
again. The output of `ceph health detail` could also potentially  
have something to point us in the right direction if the module is  
fully crashing (sorry if I'm requesting something you've already  
posted).

On Wed, Mar 12, 2025 at 2:26 PM Eugen Block <eblock@xxxxxx  
<mailto:eblock@xxxxxx>> wrote:

   I'm still not sure yet if we were looking at the right logs. Could you
   revert the changes to the unit.run file, and then fail the mgr to get
   a new set of logs. Then on the node with the active mgr run 'cephadm
   logs --name mgr.<active_mgr>' and then paste the relevant output in a
   pastebin or similar. Redact sensitive information. Alternatively, you
   can get them with 'journalctl -u
   ceph-<cluster_fsid>@mgr.<active_mgr>'. I would also like to understand
   what's going on rather nuking the cluster and hope that next time it
   works (in production).

   Zitat von Moritz Baumann <mo@xxxxxxxxxxxxx <mailto:mo@xxxxxxxxxxxxx>>:

    > The cluster was successfully upgraded to 19.2.1 and has been running
    > some time with that version before it broke.
    >
    > As you suggested I have changed the unit.run file and replaced the
    > sha256:XYZ.... with the one from
    > https://quay.io/repository/ceph/ceph?tab=tags <https://quay.io/
   repository/ceph/ceph?tab=tags>
    >
    > (
    >
    >
   sha256:200087c35811bf28e8a8073b15fa86c07cce85c575f1ccd62d1d6ddbfdc6770a
    >
    > )
    >
    >
    > ceph01[1]:/var/lib/ceph/4cfb1138-d2a5-11ef-87eb-1c34da408e52/
   mgr.ceph01.ombogc# diff unit.run*
    > -Nur
    > --- unit.run    2025-03-12 13:33:43.455465357 +0100
    > +++ unit.run.bak    2025-03-12 13:32:44.713805212 +0100
    > @@ -5,4 +5,4 @@
    >  ! /usr/bin/podman rm -f
    > ceph-4cfb1138-d2a5-11ef-87eb-1c34da408e52-mgr-ceph01-ombogc 2>
    > /dev/null
    >  ! /usr/bin/podman rm -f --storage
    > ceph-4cfb1138-d2a5-11ef-87eb-1c34da408e52-mgr-ceph01-ombogc 2>
    > /dev/null
    >  ! /usr/bin/podman rm -f --storage
    > ceph-4cfb1138-d2a5-11ef-87eb-1c34da408e52-mgr.ceph01.ombogc 2>
    > /dev/null
    > -/usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM
    > --net=host --entrypoint /usr/bin/ceph-mgr --init --name
    > ceph-4cfb1138-d2a5-11ef-87eb-1c34da408e52-mgr-ceph01-ombogc
    > --pids-limit=-1 -d --log-driver journald --conmon-pidfile
    > /run/ceph-4cfb1138-
   d2a5-11ef-87eb-1c34da408e52@mgr.ceph01.ombogc.service-pid
    > --cidfile
    > /run/ceph-4cfb1138-
   d2a5-11ef-87eb-1c34da408e52@mgr.ceph01.ombogc.service-cid
    > --cgroups=split -e
    > CONTAINER_IMAGE=quay.io/ceph/

ceph@sha256:200087c35811bf28e8a8073b15fa86c07cce85c575f1ccd62d1d6ddbfdc6770a <http://quay.io/ceph/ceph@sha256:200087c35811bf28e8a8073b15fa86c07cce85c575f1ccd62d1d6ddbfdc6770a> -e NODE_NAME=ceph01.inf.ethz.ch <http://ceph01.inf.ethz.ch> -e TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728 -v /var/run/ceph/4cfb1138-d2a5-11ef-87eb-1c34da408e52:/var/run/ceph:z -v /var/log/ceph/4cfb1138-d2a5-11ef-87eb-1c34da408e52:/var/log/ceph:z -v /var/lib/ceph/4cfb1138-d2a5-11ef-87eb-1c34da408e52/crash:/var/lib/ceph/crash:z -v /run/systemd/journal:/run/systemd/journal -v /var/lib/ceph/4cfb1138-d2a5-11ef-87eb-1c34da408e52/mgr.ceph01.ombogc:/var/lib/ceph/mgr/ceph-ceph01.ombogc:z -v /var/lib/ceph/4cfb1138-d2a5-11ef-87eb-1c34da408e52/mgr.ceph01.ombogc/config:/etc/ceph/ceph.conf:z -v /etc/hosts:/etc/hosts:ro quay.io/ceph/ceph@sha256:200087c35811bf28e8a8073b15fa86c07cce85c575f1ccd62d1d6ddbfdc6770a <http://quay.io/ceph/ceph@sha256:200087c35811bf28e8a8073b15fa86c07cce85c575f1ccd62d1d6ddbfdc6770a> -n mgr.ceph01.ombogc -f --setuser ceph --setgroup ceph --default-log-to-file=false  
--default-log-to-journald=true
    > --default-log-to-stderr=false
    > +/usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM
    > --net=host --entrypoint /usr/bin/ceph-mgr --init --name
    > ceph-4cfb1138-d2a5-11ef-87eb-1c34da408e52-mgr-ceph01-ombogc
    > --pids-limit=-1 -d --log-driver journald --conmon-pidfile
    > /run/ceph-4cfb1138-
   d2a5-11ef-87eb-1c34da408e52@mgr.ceph01.ombogc.service-pid
    > --cidfile
    > /run/ceph-4cfb1138-
   d2a5-11ef-87eb-1c34da408e52@mgr.ceph01.ombogc.service-cid
    > --cgroups=split -e
    > CONTAINER_IMAGE=quay.io/ceph/

ceph@sha256:41d3f5e46ff7de28544cc8869fdea13fca824dcef83936cb3288ed9de935e4de <http://quay.io/ceph/ceph@sha256:41d3f5e46ff7de28544cc8869fdea13fca824dcef83936cb3288ed9de935e4de> -e NODE_NAME=ceph01.inf.ethz.ch <http://ceph01.inf.ethz.ch> -e TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728 -v /var/run/ceph/4cfb1138-d2a5-11ef-87eb-1c34da408e52:/var/run/ceph:z -v /var/log/ceph/4cfb1138-d2a5-11ef-87eb-1c34da408e52:/var/log/ceph:z -v /var/lib/ceph/4cfb1138-d2a5-11ef-87eb-1c34da408e52/crash:/var/lib/ceph/crash:z -v /run/systemd/journal:/run/systemd/journal -v /var/lib/ceph/4cfb1138-d2a5-11ef-87eb-1c34da408e52/mgr.ceph01.ombogc:/var/lib/ceph/mgr/ceph-ceph01.ombogc:z -v /var/lib/ceph/4cfb1138-d2a5-11ef-87eb-1c34da408e52/mgr.ceph01.ombogc/config:/etc/ceph/ceph.conf:z -v /etc/hosts:/etc/hosts:ro quay.io/ceph/ceph@sha256:41d3f5e46ff7de28544cc8869fdea13fca824dcef83936cb3288ed9de935e4de <http://quay.io/ceph/ceph@sha256:41d3f5e46ff7de28544cc8869fdea13fca824dcef83936cb3288ed9de935e4de> -n mgr.ceph01.ombogc -f --setuser ceph --setgroup ceph --default-log-to-file=false  
--default-log-to-journald=true
    > --default-log-to-stderr=false
    >
    >
    > stopped the 19.2.1 version
    >
    > # cephadm unit stop --name mgr.ceph01.ombogc
    > # cephadm unit start --name mgr.ceph01.ombogc
    >
    > and failed to the other mgrs
    >
    > # ceph mgr fail ceph01.ombogc
    >
    > # ceph orch host ls
    >
    > Error ENOENT: Module not found
    >
    > # ceph mgr fail ceph02.nqlpce
    >
    > # ceph orch host ls
    >
    > Error ENOENT: Module not found
    >
    >
    >
    > # podman exec -ti <id of the mgr> /bin/bash
    >
    > in the container
    >
    > ceph --version
    >
    >
    > on both nodes shows that one is running 19.2.1 and the other one is
    > running 19.2.0
    >
    >
    > but no matter which mgr is active the ceph orch commands do not work.
    >
    >
    > we were only in the process of making this cluster productive, so if
    > it comes to the wort I can copy the little data on it somewhere and
    > reinstall it, but I hoped to find a solution (to feel better that
    > when this should happen again there is a way to recover).
    >
    >
    >
    >
    >
    >
    > On 3/10/25 22:55, Eugen Block wrote:
    >> I don’t have a better idea than to downgrade one MGR by editing the
    >> unit.run file to use a different version (19.2.0 seems to have
    >> worked, right?), restart it, then fail the MGRs until the older one
    >> takes over (or stop all newer ones). And then check if the
    >> orchestrator still misbehaves.
    >>
    >>
    >> Zitat von Moritz Baumann <mo@xxxxxxxxxxxxx
   <mailto:mo@xxxxxxxxxxxxx>>:
    >>
    >>> It was installed with 19.2.0:
    >>>
    >>> dnf history info 20 shows:
    >>>
    >>> ceph01[0]:~# dnf history info 20
    >>> Transaction ID : 20
    >>> Begin time     : Tue 14 Jan 2025 07:22:00 PM CET
    >>> Begin rpmdb    :
    >>> 0fac0c11ba074f9635117d0b3d4970e6d26e8d2b40bfe8b93d22fd12147260b7
    >>> End time       : Tue 14 Jan 2025 07:22:04 PM CET (4 seconds)
    >>> End rpmdb      :
    >>> 63657f6d4b27790107998bd21da76fe182d9c121177b34cf879243884deba40a
    >>> User           : root <root>
    >>> Return-Code    : Success
    >>> Releasever     : 9
    >>> Command Line   : install -y ceph-common cephadm
    >>> Comment        :
    >>> Packages Altered:
    >>>     Install boost-program-options-1.75.0-8.el9.x86_64 @appstream
    >>>     Install libbabeltrace-1.5.8-10.el9.x86_64 @appstream
    >>>     Install libpmem-1.12.1-1.el9.x86_64 @appstream
    >>>     Install libpmemobj-1.12.1-1.el9.x86_64 @appstream
    >>>     Install librabbitmq-0.11.0-7.el9.x86_64 @appstream
    >>>     Install librdkafka-1.6.1-102.el9.x86_64 @appstream
    >>>     Install lttng-ust-2.12.0-6.el9.x86_64 @appstream
    >>>     Install python3-prettytable-0.7.2-27.el9.noarch @appstream
    >>>     Install daxctl-libs-78-2.el9.x86_64                 @baseos
    >>>     Install librdmacm-51.0-1.el9.x86_64                 @baseos
    >>>     Install ndctl-libs-78-2.el9.x86_64                  @baseos
    >>>     Install ceph-common-2:19.2.0-0.el9.x86_64           @Ceph
    >>>     Install libcephfs2-2:19.2.0-0.el9.x86_64            @Ceph
    >>>     Install librados2-2:19.2.0-0.el9.x86_64             @Ceph
    >>>     Install libradosstriper1-2:19.2.0-0.el9.x86_64      @Ceph
    >>>     Install librbd1-2:19.2.0-0.el9.x86_64               @Ceph
    >>>     Install librgw2-2:19.2.0-0.el9.x86_64               @Ceph
    >>>     Install python3-ceph-argparse-2:19.2.0-0.el9.x86_64 @Ceph
    >>>     Install python3-ceph-common-2:19.2.0-0.el9.x86_64   @Ceph
    >>>     Install python3-cephfs-2:19.2.0-0.el9.x86_64        @Ceph
    >>>     Install python3-rados-2:19.2.0-0.el9.x86_64         @Ceph
    >>>     Install python3-rbd-2:19.2.0-0.el9.x86_64           @Ceph
    >>>     Install python3-rgw-2:19.2.0-0.el9.x86_64           @Ceph
    >>>     Install cephadm-2:19.2.0-0.el9.noarch @Ceph-noarch
    >>>     Install gperftools-libs-2.9.1-3.el9.x86_64          @epel
    >>>     Install libarrow-9.0.0-13.el9.x86_64                @epel
    >>>     Install libarrow-doc-9.0.0-13.el9.noarch            @epel
    >>>     Install liboath-2.6.12-1.el9.x86_64                 @epel
    >>>     Install libunwind-1.6.2-1.el9.x86_64                @epel
    >>>     Install parquet-libs-9.0.0-13.el9.x86_64            @epel
    >>>     Install re2-1:20211101-20.el9.x86_64                @epel
    >>>     Install thrift-0.15.0-4.el9.x86_64                  @epel
    >>> Scriptlet output:
    >>>    1 Detected autofs mount point /home during canonicalization
   of /home.
    >>>    2 Skipping /home
    >>>
    >>> I dont have that output but here is the bash history:
    >>>
    >>>     1  [2025-01-14-19:21:39] dnf install -y
    >>> https://download.ceph.com/rpm-19.2.0/el9/noarch/ceph-
   release-1-1.el9.noarch.rpm <https://download.ceph.com/rpm-19.2.0/
   el9/noarch/ceph-release-1-1.el9.noarch.rpm>
    >>>     2  [2025-01-14-19:21:41] dnf install -y ceph-common cephadm
    >>>     3  [2025-01-14-19:22:08] host 172.31.78.11
    >>>     4  [2025-01-14-19:22:22] cephadm bootstrap --mon-ip
    >>> 172.31.78.11 --cluster-network 172.31.14.224/27
   <http://172.31.14.224/27>
    >>> --allow-fqdn-hostname | tee initial_cephadm_out
    >>>     5  [2025-01-14-19:23:13] cp /etc/ceph/ceph.pub
    >>> /root/.ssh/authorized_keys2
    >>>     6  [2025-01-14-19:23:13] chmod 600 /root/.ssh/authorized_keys2
    >>>     7  [2025-01-14-19:23:26] ceph telemetry on --license
   sharing-1-0
    >>>     8  [2025-01-14-19:23:26] ceph telemetry enable channel perf
    >>>     9  [2025-01-14-19:24:13] nmcli connection add type vlan
    >>> con-name backend ifname backend vlan.id <http://vlan.id> 2987
   vlan.parent bond0
    >>> ipv4.method manual ipv4.addresses 172.31.14.231/27
   <http://172.31.14.231/27> ipv4.may-fail
    >>> no ipv6.method disabled ipv6.may-fail yes connection.zone trusted
    >>>    10  [2025-01-14-19:27:18] ceph mgr module disable cephadm
    >>>    11  [2025-01-14-19:27:26] ceph fsid
    >>>    12  [2025-01-14-19:27:36] cephadm rm-cluster --force --zap-osds
    >>> --fsid 7f8377c2-d2a4-11ef-adc1-1c34da408e52
    >>>    13  [2025-01-14-19:28:07] cephadm bootstrap --mon-ip
    >>> 172.31.78.11 --cluster-network 172.31.14.224/27
   <http://172.31.14.224/27>
    >>> --allow-fqdn-hostname | tee initial_cephadm_out
    >>>    14  [2025-01-14-19:28:46] cp /etc/ceph/ceph.pub
    >>> /root/.ssh/authorized_keys2
    >>>    15  [2025-01-14-19:28:49] chmod 600 /root/.ssh/authorized_keys2
    >>>    16  [2025-01-14-19:29:05] ceph telemetry on --license
   sharing-1-0
    >>>    17  [2025-01-14-19:29:06] ceph telemetry enable channel perf
    >>>    18  [2025-01-14-19:29:17] scp /root/.ssh/authorized_keys2
    >>> ceph02.inf.ethz.ch:/root/.ssh/authorized_keys2
    >>>    19  [2025-01-14-19:29:19] scp /root/.ssh/authorized_keys2
    >>> ceph03.inf.ethz.ch:/root/.ssh/authorized_keys2
    >>>    20  [2025-01-14-19:29:21] scp /root/.ssh/authorized_keys2
    >>> ceph04.inf.ethz.ch:/root/.ssh/authorized_keys2
    >>>    21  [2025-01-14-19:29:22] scp /root/.ssh/authorized_keys2
    >>> ceph05.inf.ethz.ch:/root/.ssh/authorized_keys2
    >>>    22  [2025-01-14-19:29:24] yes
    >>>    23  [2025-01-14-19:30:15] ceph config set osd
   osd_memory_target 8G
    >>>    24  [2025-01-14-19:30:22] ceph config set mds
   mds_cache_memory_limit 64G
    >>>    25  [2025-01-14-19:30:30] ceph orch apply osd --all-
   available-devices
    >>>    26  [2025-01-14-19:32:27] ceph orch host add
   ceph02.inf.ethz.ch <http://ceph02.inf.ethz.ch>
    >>> 172.31.78.12 --labels
    >>> _admin,nfs,mgr,mds,rgw,rbd,mon,osd,grafana,iscsi
    >>>    27  [2025-01-14-19:55:31] ceph orch host add
   ceph03.inf.ethz.ch <http://ceph03.inf.ethz.ch>
    >>> 172.31.78.13 --labels
    >>> _admin,nfs,mgr,mds,rgw,rbd,mon,osd,grafana,iscsi
    >>>    28  [2025-01-14-20:11:57] ceph orch host add
   ceph04.inf.ethz.ch <http://ceph04.inf.ethz.ch>
    >>> 172.31.78.14 --labels
    >>> _admin,nfs,mgr,mds,rgw,rbd,mon,osd,grafana,iscsi
    >>>    29  [2025-01-14-20:23:04] ceph orch host add
   ceph05.inf.ethz.ch <http://ceph05.inf.ethz.ch>
    >>> 172.31.78.15 --labels
    >>> _admin,nfs,mgr,mds,rgw,rbd,mon,osd,grafana,iscsi
    >>>    30  [2025-01-14-20:23:27] #ceph orch apply osd
    >>> --all-available-devices --unmanaged=true
    >>>    31  [2025-01-14-20:47:24] ceph orch apply osd
    >>> --all-available-devices --unmanaged=true
    >>>    32  [2025-01-14-21:24:41] ls
    >>>    33  [2025-01-14-21:24:47] cat initial_cephadm_out
    >>>
    >>>
    >>>
    >>>
    >>>
    >>> On 3/10/25 09:30, Eugen Block wrote:
    >>>> (Don't drop the list in your responses)
    >>>> There's a lot going on in the cephadm.log, it's not clear which
    >>>> of these are relevant to the issue. This one looks bad though:
    >>>>
    >>>> ValueError: "'xwork.MethodAccessor.denyMethodExecution'" does not
    >>>> appear to be an IPv4 or IPv6 address
    >>>>
    >>>> Did you build this cluster with 19.2.1 or did you upgrade? Do you
    >>>> have the output from 'ceph orch host ls' lying around (I know,
    >>>> currently it's not working)? Are you using IPv6? I haven't seen
    >>>> this issue yet.
    >>>>
    >>>> Zitat von Moritz Baumann <mo@xxxxxxxxxxxxx
   <mailto:mo@xxxxxxxxxxxxx>>:
    >>>>
    >>>>> Am 09.03.2025 um 15:22 schrieb Moritz Baumann:
    >>>>>> I dont want to sound stupid, but how do I get the log?
    >>>>>>
    >>>>>> Am 09.03.2025 um 11:11 schrieb Eugen Block:
    >>>>>>> ceph versions
    >>>>>>
    >>>>>
    >>>>>
    >>>>> ceph01[22]:~# cephadm shell -- ceph log last cephadm
    >>>>> Inferring fsid 4cfb1138-d2a5-11ef-87eb-1c34da408e52
    >>>>> Inferring config
    >>>>> /var/lib/ceph/4cfb1138-d2a5-11ef-87eb-1c34da408e52/
   mon.ceph01/config
    >>>>> Not using image
    >>>>>
   'f2efb0401a30ec7eda97b6da76b314bd081fcb910cc5dcd826bc7c72c9dfdd7d' as
    >>>>> it's not in list of non-dangling images with ceph=True label
    >>>>> 2025-02-25T08:42:54.590667+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 1706 : cephadm [INF] maintenance mode request for
    >>>>> ceph05.inf.ethz.ch <http://ceph05.inf.ethz.ch> has SET the
   noout group
    >>>>> 2025-02-25T08:43:38.380838+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 1729 : cephadm [ERR] Detected invalid grafana certificate on
    >>>>> host ceph05.inf.ethz.ch <http://ceph05.inf.ethz.ch>: Invalid
   certificate key: [('PEM
    >>>>> routines', '', 'no start line')]
    >>>>> 2025-02-25T08:44:40.343420+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 1762 : cephadm [INF] Adjusting osd_memory_target on
    >>>>> ceph04.inf.ethz.ch <http://ceph04.inf.ethz.ch> to 33423M
    >>>>> 2025-02-25T08:44:40.860933+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 1764 : cephadm [ERR] Detected invalid grafana certificate on
    >>>>> host ceph05.inf.ethz.ch <http://ceph05.inf.ethz.ch>: Invalid
   certificate key: [('PEM
    >>>>> routines', '', 'no start line')]
    >>>>> 2025-02-25T08:45:41.993564+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 1797 : cephadm [ERR] Detected invalid grafana certificate on
    >>>>> host ceph05.inf.ethz.ch <http://ceph05.inf.ethz.ch>: Invalid
   certificate key: [('PEM
    >>>>> routines', '', 'no start line')]
    >>>>> 2025-02-25T08:46:42.758717+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 1830 : cephadm [ERR] Detected invalid grafana certificate on
    >>>>> host ceph05.inf.ethz.ch <http://ceph05.inf.ethz.ch>: Invalid
   certificate key: [('PEM
    >>>>> routines', '', 'no start line')]
    >>>>> 2025-02-25T08:47:44.190946+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 1864 : cephadm [ERR] Unable to reach remote host
    >>>>> ceph05.inf.ethz.ch <http://ceph05.inf.ethz.ch>. SSH
   connection closed
    >>>>> Traceback (most recent call last):
    >>>>>   File "/usr/share/ceph/mgr/cephadm/ssh.py", line 245, in
    >>>>> _execute_command
    >>>>>     r = await conn.run(str(rcmd), input=stdin)
    >>>>>   File "/lib/python3.9/site-packages/asyncssh/connection.py",
    >>>>> line 4250, in run
    >>>>>     process = await self.create_process(*args, **kwargs) #
   type: ignore
    >>>>>   File "/lib/python3.9/site-packages/asyncssh/connection.py",
    >>>>> line 4128, in create_process
    >>>>>     chan, process = await self.create_session(
    >>>>>   File "/lib/python3.9/site-packages/asyncssh/connection.py",
    >>>>> line 4018, in create_session
    >>>>>     chan = SSHClientChannel(self, self._loop, encoding, errors,
    >>>>>   File "/lib/python3.9/site-packages/asyncssh/channel.py", line
    >>>>> 1085, in __init__
    >>>>>     super().__init__(conn, loop, encoding, errors, window,
   max_pktsize)
    >>>>>   File "/lib/python3.9/site-packages/asyncssh/channel.py", line
    >>>>> 140, in __init__
    >>>>>     self._recv_chan: Optional[int] = conn.add_channel(self)
    >>>>>   File "/lib/python3.9/site-packages/asyncssh/connection.py",
    >>>>> line 1307, in add_channel
    >>>>>     raise ChannelOpenError(OPEN_CONNECT_FAILED,
    >>>>> asyncssh.misc.ChannelOpenError: SSH connection closed
    >>>>>
    >>>>> During handling of the above exception, another exception
   occurred:
    >>>>>
    >>>>> Traceback (most recent call last):
    >>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line
    >>>>> 138, in wrapper
    >>>>>     return OrchResult(f(*args, **kwargs))
    >>>>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 162, in
   wrapper
    >>>>>     return func(*args, **kwargs)
    >>>>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 2224, in
    >>>>> exit_host_maintenance
    >>>>>     outs, errs, _code = self.wait_async(
    >>>>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 815, in
   wait_async
    >>>>>     return self.event_loop.get_result(coro, timeout)
    >>>>>   File "/usr/share/ceph/mgr/cephadm/ssh.py", line 136, in
   get_result
    >>>>>     return future.result(timeout)
    >>>>>   File "/lib64/python3.9/concurrent/futures/_base.py", line 446,
    >>>>> in result
    >>>>>     return self.__get_result()
    >>>>>   File "/lib64/python3.9/concurrent/futures/_base.py", line 391,
    >>>>> in __get_result
    >>>>>     raise self._exception
    >>>>>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1669, in
   _run_cephadm
    >>>>>     python = await self.mgr.ssh._check_execute_command(host,
    >>>>> cmd, addr=addr)
    >>>>>   File "/usr/share/ceph/mgr/cephadm/ssh.py", line 300, in
    >>>>> _check_execute_command
    >>>>>     out, err, code = await self._execute_command(host, cmd,
    >>>>> stdin, addr, log_command)
    >>>>>   File "/usr/share/ceph/mgr/cephadm/ssh.py", line 253, in
    >>>>> _execute_command
    >>>>>     raise HostConnectionError(f'Unable to reach remote host
    >>>>> {host}. {str(e)}', host, address)
    >>>>> cephadm.ssh.HostConnectionError: Unable to reach remote host
    >>>>> ceph05.inf.ethz.ch <http://ceph05.inf.ethz.ch>. SSH
   connection closed
    >>>>> 2025-02-25T08:47:44.762770+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 1865 : cephadm [ERR] Detected invalid grafana certificate on
    >>>>> host ceph05.inf.ethz.ch <http://ceph05.inf.ethz.ch>: Invalid
   certificate key: [('PEM
    >>>>> routines', '', 'no start line')]
    >>>>> 2025-02-25T08:47:55.631018+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 1874 : cephadm [INF] exit maintenance request has UNSET for the
    >>>>> noout group on host ceph05.inf.ethz.ch <http://
   ceph05.inf.ethz.ch>
    >>>>> 2025-02-25T08:48:47.623644+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 1901 : cephadm [INF] Adjusting osd_memory_target on
    >>>>> ceph05.inf.ethz.ch <http://ceph05.inf.ethz.ch> to 32911M
    >>>>> 2025-02-25T08:48:47.827026+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 1903 : cephadm [ERR] Detected invalid grafana certificate on
    >>>>> host ceph05.inf.ethz.ch <http://ceph05.inf.ethz.ch>: Invalid
   certificate key: [('PEM
    >>>>> routines', '', 'no start line')]
    >>>>> 2025-02-25T08:49:21.108932+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 1925 : cephadm [INF] Schedule redeploy daemon grafana.ceph05
    >>>>> 2025-02-25T08:49:21.292879+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 1926 : cephadm [INF] Regenerating cephadm self-signed grafana
    >>>>> TLS certificates
    >>>>> 2025-02-25T08:49:21.328348+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 1929 : cephadm [INF] Deploying daemon grafana.ceph05 on
    >>>>> ceph05.inf.ethz.ch <http://ceph05.inf.ethz.ch>
    >>>>> 2025-02-25T09:07:00.824841+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 2461 : cephadm [INF] Detected new or changed devices on
    >>>>> ceph05.inf.ethz.ch <http://ceph05.inf.ethz.ch>
    >>>>> 2025-02-25T09:07:01.376245+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 2462 : cephadm [INF] Detected new or changed devices on
    >>>>> ceph04.inf.ethz.ch <http://ceph04.inf.ethz.ch>
    >>>>> 2025-02-28T06:18:29.439683+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 137418 : cephadm [ERR] [28/Feb/2025:06:18:29] ENGINE
    >>>>> ValueError('"\'xwork.MethodAccessor.denyMethodExecution\'" does
    >>>>> not appear to be an IPv4 or IPv6 address')
    >>>>> Traceback (most recent call last):
    >>>>>   File "/lib/python3.9/site-packages/cheroot/server.py", line
    >>>>> 1281, in communicate
    >>>>>     req.parse_request()
    >>>>>   File "/lib/python3.9/site-packages/cheroot/server.py", line
    >>>>> 714, in parse_request
    >>>>>     success = self.read_request_line()
    >>>>>   File "/lib/python3.9/site-packages/cheroot/server.py", line
    >>>>> 815, in read_request_line
    >>>>>     scheme, authority, path, qs, fragment =
   urllib.parse.urlsplit(uri)
    >>>>>   File "/lib64/python3.9/urllib/parse.py", line 510, in urlsplit
    >>>>>     _check_bracketed_host(bracketed_host)
    >>>>>   File "/lib64/python3.9/urllib/parse.py", line 453, in
    >>>>> _check_bracketed_host
    >>>>>     ip = ipaddress.ip_address(hostname) # Throws Value Error if
    >>>>> not IPv6 or IPv4
    >>>>>   File "/lib64/python3.9/ipaddress.py", line 53, in ip_address
    >>>>>     raise ValueError(f'{address!r} does not appear to be an IPv4
    >>>>> or IPv6 address')
    >>>>> ValueError: "'xwork.MethodAccessor.denyMethodExecution'" does
    >>>>> not appear to be an IPv4 or IPv6 address
    >>>>>
    >>>>> 2025-02-28T06:18:29.441452+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 137419 : cephadm [ERR] [28/Feb/2025:06:18:29] ENGINE
    >>>>> ValueError('"\'xwork.MethodAccessor.denyMethodExecution\'" does
    >>>>> not appear to be an IPv4 or IPv6 address')
    >>>>> Traceback (most recent call last):
    >>>>>   File "/lib/python3.9/site-packages/cheroot/server.py", line
    >>>>> 1281, in communicate
    >>>>>     req.parse_request()
    >>>>>   File "/lib/python3.9/site-packages/cheroot/server.py", line
    >>>>> 714, in parse_request
    >>>>>     success = self.read_request_line()
    >>>>>   File "/lib/python3.9/site-packages/cheroot/server.py", line
    >>>>> 815, in read_request_line
    >>>>>     scheme, authority, path, qs, fragment =
   urllib.parse.urlsplit(uri)
    >>>>>   File "/lib64/python3.9/urllib/parse.py", line 510, in urlsplit
    >>>>>     _check_bracketed_host(bracketed_host)
    >>>>>   File "/lib64/python3.9/urllib/parse.py", line 453, in
    >>>>> _check_bracketed_host
    >>>>>     ip = ipaddress.ip_address(hostname) # Throws Value Error if
    >>>>> not IPv6 or IPv4
    >>>>>   File "/lib64/python3.9/ipaddress.py", line 53, in ip_address
    >>>>>     raise ValueError(f'{address!r} does not appear to be an IPv4
    >>>>> or IPv6 address')
    >>>>> ValueError: "'xwork.MethodAccessor.denyMethodExecution'" does
    >>>>> not appear to be an IPv4 or IPv6 address
    >>>>>
    >>>>> 2025-03-07T06:17:28.282574+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 447472 : cephadm [ERR] [07/Mar/2025:06:17:28] ENGINE
    >>>>> ValueError('"\'xwork.MethodAccessor.denyMethodExecution\'" does
    >>>>> not appear to be an IPv4 or IPv6 address')
    >>>>> Traceback (most recent call last):
    >>>>>   File "/lib/python3.9/site-packages/cheroot/server.py", line
    >>>>> 1281, in communicate
    >>>>>     req.parse_request()
    >>>>>   File "/lib/python3.9/site-packages/cheroot/server.py", line
    >>>>> 714, in parse_request
    >>>>>     success = self.read_request_line()
    >>>>>   File "/lib/python3.9/site-packages/cheroot/server.py", line
    >>>>> 815, in read_request_line
    >>>>>     scheme, authority, path, qs, fragment =
   urllib.parse.urlsplit(uri)
    >>>>>   File "/lib64/python3.9/urllib/parse.py", line 510, in urlsplit
    >>>>>     _check_bracketed_host(bracketed_host)
    >>>>>   File "/lib64/python3.9/urllib/parse.py", line 453, in
    >>>>> _check_bracketed_host
    >>>>>     ip = ipaddress.ip_address(hostname) # Throws Value Error if
    >>>>> not IPv6 or IPv4
    >>>>>   File "/lib64/python3.9/ipaddress.py", line 53, in ip_address
    >>>>>     raise ValueError(f'{address!r} does not appear to be an IPv4
    >>>>> or IPv6 address')
    >>>>> ValueError: "'xwork.MethodAccessor.denyMethodExecution'" does
    >>>>> not appear to be an IPv4 or IPv6 address
    >>>>>
    >>>>> 2025-03-09T07:36:00.522250+0000 mgr.ceph01.ombogc (mgr.180813)
    >>>>> 536135 : cephadm [ERR] ALERT: Cannot stop active Mgr daemon,
    >>>>> Please switch active Mgrs with 'ceph mgr fail ceph01.ombogc'
    >>>>> Note: Warnings can be bypassed with the --force flag
    >>>>> Traceback (most recent call last):
    >>>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line
    >>>>> 138, in wrapper
    >>>>>     return OrchResult(f(*args, **kwargs))
    >>>>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 162, in
   wrapper
    >>>>>     return func(*args, **kwargs)
    >>>>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 2165, in
    >>>>> enter_host_maintenance
    >>>>>     raise OrchestratorError(
    >>>>> orchestrator._interface.OrchestratorError: ALERT: Cannot stop
    >>>>> active Mgr daemon, Please switch active Mgrs with 'ceph mgr fail
    >>>>> ceph01.ombogc'
    >>>>> Note: Warnings can be bypassed with the --force flag
    >>>>
    >>>>
    >>>>
    >>
    >>
    >>

   _______________________________________________
   ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-
   users@xxxxxxx>
   To unsubscribe send an email to ceph-users-leave@xxxxxxx
   <mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx