MDS error

André de Freitas Smaira <afsmaira@xxxxxxxxx> · Fri, 13 Jan 2023 15:43:19 -0300

Hello!

Yesterday we found some errors in our cephadm disks, which is making it
impossible to access our HPC Cluster:

# ceph health detail
HEALTH_WARN 3 failed cephadm daemon(s); insufficient standby MDS daemons
available
[WRN] CEPHADM_FAILED_DAEMON: 3 failed cephadm daemon(s)
    daemon mds.cephfs.s1.nvopyf on s1.ceph.infra.ufscar.br is in error state
    daemon mds.cephfs.s2.qikxmw on s2.ceph.infra.ufscar.br is in error state
    daemon mds.cftv.s2.anybzk on s2.ceph.infra.ufscar.br is in error state
[WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons available
    have 0; want 1 more

Googling we found out that we should remove the failed MDS, but the data in
these disks is relatively important. We would like to know if we need to
remove it or if it can be fixed, and if we have to remove it if the data
will be lost. Please tell me if you need more information.

Thanks in advance,
André de Freitas Smaira
Federal University of São Carlos - UFSCar
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx