Hi Noe, If the MDS has failed and you're sure of the fact that there are no pending tasks or sessions associated with the failed MDS, you can try to make use of `ceph mds rmfailed` but beware this MDS is really doing nothing and doesn't link to any file system otherwise things can go wrong and can lead to an inaccessible file system, more info regarding the command can be found at [0] and [1]. [0] https://docs.ceph.com/en/quincy/man/8/ceph/ [1] https://docs.ceph.com/en/latest/cephfs/administration/#advanced -- *Dhairya Parmar* Associate Software Engineer, CephFS <https://www.redhat.com/>IBM, Inc. On Wed, May 29, 2024 at 4:24 PM Noe P. <ml@am-rand.berlin> wrote: > Hi, > > after our desaster yesterday, it seems that we got our MONs back. > One of the filesystems, however, seems in a strange state: > > % ceph fs status > > .... > fs_cluster - 782 clients > ========== > RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS > 0 active cephmd6a Reqs: 5 /s 13.2M 13.2M 1425k 51.4k > 1 failed > POOL TYPE USED AVAIL > fs_cluster_meta metadata 3594G 53.5T > fs_cluster_data data 421T 53.5T > .... > STANDBY MDS > cephmd6b > cephmd4b > MDS version: ceph version 17.2.7 > (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) > > > % ceph fs dump > .... > Filesystem 'fs_cluster' (3) > fs_name fs_cluster > epoch 3068261 > flags 12 joinable allow_snaps allow_multimds_snaps > created 2022-08-26T15:55:07.186477+0200 > modified 2024-05-29T12:43:30.606431+0200 > tableserver 0 > root 0 > session_timeout 60 > session_autoclose 300 > max_file_size 4398046511104 > required_client_features {} > last_failure 0 > last_failure_osd_epoch 1777109 > compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline > data,8=no anchor table,9=file layout v2,10=snaprealm v2} > max_mds 2 > in 0,1 > up {0=911794623} > failed > damaged > stopped 2,3 > data_pools [32] > metadata_pool 33 > inline_data disabled > balancer > standby_count_wanted 1 > [mds.cephmd6a{0:911794623} state up:active seq 44701 addr [v2: > 10.13.5.6:6800/189084355,v1:10.13.5.6:6801/189084355] compat > {c=[1],r=[1],i=[7ff]}] > > > We would like to get rid of the failed rank 1 (without crashing the MONs) > and have a 2nd MD from the standbys step in . > > Anyone have an idea how to do this ? > I'm a bit reluctant to try 'ceph mds rmfailed', as this seems to have > triggered the MONs to crash. > > Regards, > Noe > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx