Actually the cluster is in an error state due to (I think) these problems. ceph -s cluster: id: lksdjf health: HEALTH_ERR 18 failed cephadm daemon(s) 2 filesystems are degraded 1 filesystem has a failed mds daemon 1 filesystem is offline 1 mds daemon damaged insufficient standby MDS daemons available services: mon: 3 daemons, quorum ceph02-hn02,ceph02-hn03,ceph02-hn04 (age 7m) mgr: ceph02-hn02.ofencx(active, since 29m), standbys: ceph02-hn03.dxswor mds: 0/2 daemons up (1 failed) osd: 264 osds: 264 up (since 24h), 263 in (since 2m) data: volumes: 0/2 healthy, 1 recovering, 1 failed; 1 damaged pools: 16 pools, 2177 pgs objects: 253.57M objects, 154 TiB usage: 338 TiB used, 3.1 PiB / 3.4 PiB avail pgs: 2177 active+clean io: client: 1.2 KiB/s rd, 39 MiB/s wr, 0 op/s rd, 19 op/s wr Some outputs I found in the logs are below Nov 1 09:47:12 ceph02-hn02 podman[645065]: 2023-11-01 09:47:12.812391169 +0100 CET m=+0.017512707 container died 3158b09500dc0ef3a8ce3282c87c5b4b5aae8d490343cae569752e69776c7683 (image= quay.io/ceph/ceph@sha256:673b48521fd53e1b4bc7dda96335505c4d4b2e13d7bb92bf2e7782e2083094c9, name=ceph-<ceph-id>-mgr-ceph02-hn02-ofencx, GIT_BRANCH=HEAD, ceph=True, GIT_CLEAN=True, org.label-schema.build-date=20230622, org.label-schema.name=CentOS Stream 8 Base Image, GIT_REPO=https://github.com/ceph/ceph-container.git, org.label-schema.schema-version=1.0, CEPH_POINT_RELEASE=-17.2.6, org.label-schema.vendor=CentOS, org.label-schema.license=GPLv2, GIT_COMMIT=e0efdfe8a55d4257c30bd4991364ca6f2fc7e58e, io.buildah.version=1.19.8, maintainer=Guillaume Abrioux <gabrioux@xxxxxxxxxx>, RELEASE=HEAD) Nov 1 09:47:12 ceph02-hn02 podman[645065]: 2023-11-01 09:47:12.822679389 +0100 CET m=+0.027800907 container remove 3158b09500dc0ef3a8ce3282c87c5b4b5aae8d490343cae569752e69776c7683 (image= quay.io/ceph/ceph@sha256:673b48521fd53e1b4bc7dda96335505c4d4b2e13d7bb92bf2e7782e2083094c9, name=ceph-<ceph-id>-mgr-ceph02-hn02-ofencx, GIT_CLEAN=True, org.label-schema.license=GPLv2, org.label-schema.vendor=CentOS, maintainer=Guillaume Abrioux <gabrioux@xxxxxxxxxx>, GIT_BRANCH=HEAD, GIT_REPO=https://github.com/ceph/ceph-container.git, org.label-schema.name=CentOS Stream 8 Base Image, ceph=True, io.buildah.version=1.19.8, org.label-schema.schema-version=1.0, GIT_COMMIT=e0efdfe8a55d4257c30bd4991364ca6f2fc7e58e, RELEASE=HEAD, CEPH_POINT_RELEASE=-17.2.6, org.label-schema.build-date=20230622) Nov 1 09:47:12 ceph02-hn02 systemd[1]: ceph-<ceph-id>@mgr.ceph02-hn02.ofencx.service: Main process exited, code=exited, status=137/n/a Nov 1 09:47:13 ceph02-hn02 systemd[1]: ceph-<ceph-id>@mgr.ceph02-hn02.ofencx.service: Failed with result 'exit-code'. Nov 1 09:47:13 ceph02-hn02 systemd[1]: ceph-<ceph-id>@mgr.ceph02-hn02.ofencx.service: Consumed 25.730s CPU time. Nov 1 09:47:23 ceph02-hn02 systemd[1]: ceph-<ceph-id>@mgr.ceph02-hn02.ofencx.service: Scheduled restart job, restart counter is at 3. Nov 1 09:47:23 ceph02-hn02 systemd[1]: Stopped Ceph mgr.ceph02-hn02.ofencx for <ceph-id>. Nov 1 09:47:23 ceph02-hn02 systemd[1]: ceph-<ceph-id>@mgr.ceph02-hn02.ofencx.service: Consumed 25.730s CPU time. Nov 1 09:47:23 ceph02-hn02 systemd[1]: Starting Ceph mgr.ceph02-hn02.ofencx for <ceph-id>... I am still looking for more information. Thank you for your answer. On Wed, Nov 1, 2023 at 5:25 PM Eugen Block <eblock@xxxxxx> wrote: > Hi, > > please provide more details about your cluster, especially the 'ceph > -s' output. Is the cluster healthy? Apparently, other ceph commands > work, but you could share the mgr logs anyway, maybe the hive mind > finds something. ;-) > Don't forget to mask sensitive data. > > Regards, > Eugen > > Zitat von Dario Graña <dgrana@xxxxxx>: > > > Hi everyone! > > > > I've a ceph cluster running AlmaLinux9, podman and Ceph Quincy (17.6.0). > > Since yesterday I was having some problems but the last one is that the > > ceph orch command is hanging up. I have seen the logs but I didn't find > > anything relevant that could help to fix the problem. > > Podman shows the daemons running and if I stop one daemon it appears > again > > after a few seconds. > > I also tried *ceph mgr fail* command, daemon order changes, but ceph orch > > still not working. ceph orch pause/resume also are not working. Disabling > > and enabling *cephadm* module didn't help. > > > > Any help to understand what's going on would be welcome. > > > > Thanks in advance. > > > > -- > > Dario Graña > > PIC (Port d'Informació Científica) > > Campus UAB, Edificio D > > E-08193 Bellaterra, Barcelona > > http://www.pic.es > > Avis - Aviso - Legal Notice: http://legal.ifae.es > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx