Re: How to recover from an MDs rank in state 'failed'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm not really sure either, what about this?

ceph mds repaired <rank>

The docs state:

Mark the file system rank as repaired. Unlike the name suggests, this command does not change a MDS; it manipulates the file system rank which has been marked damaged.

Maybe that could bring it back up? Did you set max_mds to 1 at some point? If you do it now (and you currently have only one active MDS), maybe that would clean up the failed rank as well?


Zitat von "Noe P." <ml@am-rand.berlin>:

Hi,

after our desaster yesterday, it seems that we got our MONs back.
One of the filesystems, however, seems in a strange state:

  % ceph fs status

  ....
  fs_cluster - 782 clients
  ==========
  RANK  STATE     MDS        ACTIVITY     DNS    INOS   DIRS   CAPS
   0    active  cephmd6a  Reqs:    5 /s  13.2M  13.2M  1425k  51.4k
   1    failed
        POOL         TYPE     USED  AVAIL
  fs_cluster_meta  metadata  3594G  53.5T
  fs_cluster_data    data     421T  53.5T
  ....
  STANDBY MDS
    cephmd6b
    cephmd4b
MDS version: ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)


  % ceph fs dump
  ....
  Filesystem 'fs_cluster' (3)
  fs_name fs_cluster
  epoch   3068261
  flags   12 joinable allow_snaps allow_multimds_snaps
  created 2022-08-26T15:55:07.186477+0200
  modified        2024-05-29T12:43:30.606431+0200
  tableserver     0
  root    0
  session_timeout 60
  session_autoclose       300
  max_file_size   4398046511104
  required_client_features        {}
  last_failure    0
  last_failure_osd_epoch  1777109
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
  max_mds 2
  in      0,1
  up      {0=911794623}
  failed
  damaged
  stopped 2,3
  data_pools      [32]
  metadata_pool   33
  inline_data     disabled
  balancer
  standby_count_wanted    1
[mds.cephmd6a{0:911794623} state up:active seq 44701 addr [v2:10.13.5.6:6800/189084355,v1:10.13.5.6:6801/189084355] compat {c=[1],r=[1],i=[7ff]}]


We would like to get rid of the failed rank 1 (without crashing the MONs)
and have a 2nd MD from the standbys step in .

Anyone have an idea how to do this ?
I'm a bit reluctant to try 'ceph mds rmfailed', as this seems to have
triggered the MONs to crash.

Regards,
  Noe
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux