Re: FS down - mds degraded

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Thu, 21 Dec 2023 11:26:14 -0500

On Thu, Dec 21, 2023 at 3:05 AM Sake Ceph <ceph@xxxxxxxxxxx> wrote:
>
> Hi David
>
> Reducing max_mds didn't work. So I executed a fs reset:
> ceph fs set atlassian-prod allow_standby_replay false
> ceph fs set atlassian-prod cluster_down true
> ceph mds fail atlassian-prod.pwsoel13142.egsdfl
> ceph mds fail atlassian-prod.pwsoel13143.qlvypn
> ceph fs reset atlassian-prod
> ceph fs reset atlassian-prod --yes-i-really-mean-it
>
> This brought the fs back online and the servers/applications are working again.

This was not the right thing to do. You can mark the rank repaired. See end of:

https://docs.ceph.com/en/latest/cephfs/administration/#daemons

(ceph mds repaired <role>)

I admit that is not easy to find. I will add a ticket to improve the
documentation:

https://tracker.ceph.com/issues/63885

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx