Hi, we had one failed osd in our cluster that we have replaced. Since then the cluster is behaving very strange and some ceph commands like ceph crash or ceph orch are stuck. Cluster health: [root@gedasvl98 ~]# ceph -s cluster: id: ec9e031a-cd10-11eb-a3c3-005056b7db1f health: HEALTH_WARN mons gedaopl03,gedasvl98 are using a lot of disk space mon gedasvl98 is low on available space 2 daemons have recently crashed 911 slow ops, oldest one blocked for 62 sec, daemons [mon.gedaopl03,mon.gedasvl98] have slow ops. services: mon: 2 daemons, quorum gedasvl98,gedaopl03 (age 27m) mgr: gedaopl01.fjpsnc(active, since 44m), standbys: gedaopl03.japugq mds: 1/1 daemons up, 1 standby osd: 9 osds: 9 up (since 27m), 9 in (since 2h) data: volumes: 1/1 healthy pools: 10 pools, 289 pgs objects: 7.19k objects, 39 GiB usage: 118 GiB used, 7.7 TiB / 7.8 TiB avail pgs: 289 active+clean io: client: 170 B/s rd, 170 B/s wr, 0 op/s rd, 0 op/s wr If I understand correctly the reason for the mon containers using a lot of disk space could be due to the failed osd and unclean pgs. The pgs are clean and so I would expect the mons to free up disk space again. I have also restarted the active and passive mons, but no change here. Then I remembered that I recently changed the ips of the ceph nodes using: ceph orch host set-addr gedaopl01 192.168.30.200 ceph orch host set-addr gedaopl02 192.168.30.201 ceph orch host set-addr gedaopl03 192.168.30.202 This was mainly because I think I got it all wrong in the first place deploying the cluster using cephadm. Our nodes have 3 network ports: 1 x 1GB public network 172.28.4.x (used for OS deployment etc.) 1 x 10GB ceph cluster network 192.168.41.x 1 x 10GB ceph public network 192.168.30.x If I understood correctly the IP of the mons should be one in the public network (192.168.30.x). Maybe the changes I made have caused this trouble? Best Regards, Oliver _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx