Hi all, our MDS cluster got degraded after an MDS had an oversized cache and crashed. Other MDS daemons followed suit and now they are stuck in this state: [root@gnosis ~]# ceph fs status con-fs2 - 1640 clients ======= +------+---------+---------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+---------+---------+---------------+-------+-------+ | 0 | resolve | ceph-24 | | 22.1k | 22.0k | | 1 | resolve | ceph-13 | | 769k | 758k | | 2 | active | ceph-16 | Reqs: 0 /s | 255k | 255k | | 3 | resolve | ceph-09 | | 5624 | 5619 | +------+---------+---------+---------------+-------+-------+ +---------------------+----------+-------+-------+ | Pool | type | used | avail | +---------------------+----------+-------+-------+ | con-fs2-meta1 | metadata | 1828M | 1767G | | con-fs2-meta2 | data | 0 | 1767G | | con-fs2-data | data | 1363T | 6049T | | con-fs2-data-ec-ssd | data | 239G | 4241G | | con-fs2-data2 | data | 10.2T | 5499T | +---------------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | ceph-12 | | ceph-08 | | ceph-23 | | ceph-11 | +-------------+ I tried to set max_mds to 1 to no avail. How can I get the MDS daemons back up? Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx