Actually the cluster is in an error state due to (I think) these problems.
ceph -s
cluster:
id: lksdjf
health: HEALTH_ERR
18 failed cephadm daemon(s)
2 filesystems are degraded
1 filesystem has a failed mds daemon
1 filesystem is offline
1 mds daemon damaged
insufficient standby MDS daemons available
services:
mon: 3 daemons, quorum ceph02-hn02,ceph02-hn03,ceph02-hn04 (age 7m)
mgr: ceph02-hn02.ofencx(active, since 29m), standbys: ceph02-hn03.dxswor
mds: 0/2 daemons up (1 failed)
osd: 264 osds: 264 up (since 24h), 263 in (since 2m)
data:
volumes: 0/2 healthy, 1 recovering, 1 failed; 1 damaged
pools: 16 pools, 2177 pgs
objects: 253.57M objects, 154 TiB
usage: 338 TiB used, 3.1 PiB / 3.4 PiB avail
pgs: 2177 active+clean
io:
client: 1.2 KiB/s rd, 39 MiB/s wr, 0 op/s rd, 19 op/s wr
Some outputs I found in the logs are below
Nov 1 09:47:12 ceph02-hn02 podman[645065]: 2023-11-01 09:47:12.812391169
+0100 CET m=+0.017512707 container died
3158b09500dc0ef3a8ce3282c87c5b4b5aae8d490343cae569752e69776c7683 (image=
quay.io/ceph/ceph@sha256:673b48521fd53e1b4bc7dda96335505c4d4b2e13d7bb92bf2e7782e2083094c9,
name=ceph-<ceph-id>-mgr-ceph02-hn02-ofencx, GIT_BRANCH=HEAD, ceph=True,
GIT_CLEAN=True, org.label-schema.build-date=20230622,
org.label-schema.name=CentOS
Stream 8 Base Image, GIT_REPO=https://github.com/ceph/ceph-container.git,
org.label-schema.schema-version=1.0, CEPH_POINT_RELEASE=-17.2.6,
org.label-schema.vendor=CentOS, org.label-schema.license=GPLv2,
GIT_COMMIT=e0efdfe8a55d4257c30bd4991364ca6f2fc7e58e,
io.buildah.version=1.19.8, maintainer=Guillaume Abrioux
<gabrioux@xxxxxxxxxx>,
RELEASE=HEAD)
Nov 1 09:47:12 ceph02-hn02 podman[645065]: 2023-11-01 09:47:12.822679389
+0100 CET m=+0.027800907 container remove
3158b09500dc0ef3a8ce3282c87c5b4b5aae8d490343cae569752e69776c7683 (image=
quay.io/ceph/ceph@sha256:673b48521fd53e1b4bc7dda96335505c4d4b2e13d7bb92bf2e7782e2083094c9,
name=ceph-<ceph-id>-mgr-ceph02-hn02-ofencx, GIT_CLEAN=True,
org.label-schema.license=GPLv2, org.label-schema.vendor=CentOS,
maintainer=Guillaume Abrioux <gabrioux@xxxxxxxxxx>, GIT_BRANCH=HEAD,
GIT_REPO=https://github.com/ceph/ceph-container.git,
org.label-schema.name=CentOS
Stream 8 Base Image, ceph=True, io.buildah.version=1.19.8,
org.label-schema.schema-version=1.0,
GIT_COMMIT=e0efdfe8a55d4257c30bd4991364ca6f2fc7e58e, RELEASE=HEAD,
CEPH_POINT_RELEASE=-17.2.6, org.label-schema.build-date=20230622)
Nov 1 09:47:12 ceph02-hn02 systemd[1]:
ceph-<ceph-id>@mgr.ceph02-hn02.ofencx.service: Main process exited,
code=exited, status=137/n/a
Nov 1 09:47:13 ceph02-hn02 systemd[1]:
ceph-<ceph-id>@mgr.ceph02-hn02.ofencx.service: Failed with result
'exit-code'.
Nov 1 09:47:13 ceph02-hn02 systemd[1]:
ceph-<ceph-id>@mgr.ceph02-hn02.ofencx.service: Consumed 25.730s CPU time.
Nov 1 09:47:23 ceph02-hn02 systemd[1]:
ceph-<ceph-id>@mgr.ceph02-hn02.ofencx.service: Scheduled restart job,
restart counter is at 3.
Nov 1 09:47:23 ceph02-hn02 systemd[1]: Stopped Ceph mgr.ceph02-hn02.ofencx
for <ceph-id>.
Nov 1 09:47:23 ceph02-hn02 systemd[1]:
ceph-<ceph-id>@mgr.ceph02-hn02.ofencx.service: Consumed 25.730s CPU time.
Nov 1 09:47:23 ceph02-hn02 systemd[1]: Starting Ceph
mgr.ceph02-hn02.ofencx for <ceph-id>...
I am still looking for more information.
Thank you for your answer.
On Wed, Nov 1, 2023 at 5:25 PM Eugen Block <eblock@xxxxxx> wrote:
Hi,
please provide more details about your cluster, especially the 'ceph
-s' output. Is the cluster healthy? Apparently, other ceph commands
work, but you could share the mgr logs anyway, maybe the hive mind
finds something. ;-)
Don't forget to mask sensitive data.
Regards,
Eugen
Zitat von Dario Graña <dgrana@xxxxxx>:
> Hi everyone!
>
> I've a ceph cluster running AlmaLinux9, podman and Ceph Quincy (17.6.0).
> Since yesterday I was having some problems but the last one is that the
> ceph orch command is hanging up. I have seen the logs but I didn't find
> anything relevant that could help to fix the problem.
> Podman shows the daemons running and if I stop one daemon it appears
again
> after a few seconds.
> I also tried *ceph mgr fail* command, daemon order changes, but ceph orch
> still not working. ceph orch pause/resume also are not working. Disabling
> and enabling *cephadm* module didn't help.
>
> Any help to understand what's going on would be welcome.
>
> Thanks in advance.
>
> --
> Dario Graña
> PIC (Port d'Informació Científica)
> Campus UAB, Edificio D
> E-08193 Bellaterra, Barcelona
> http://www.pic.es
> Avis - Aviso - Legal Notice: http://legal.ifae.es
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx