Re: Cephfs - MDS all up:standby, not becoming up:active

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Fri, 17 Sep 2021 14:57:21 -0400

On Fri, Sep 17, 2021 at 2:32 PM 胡 玮文 <huww98@xxxxxxxxxxx> wrote:
>
> Thank you very much. But the mds still don’t go active.

Did you run the command I suggested before or after you executed
`rmfailed` below?

> While trying to resolve this, I run:
>
> ceph mds rmfailed 0 --yes-i-really-mean-it
>
> ceph mds rmfailed 1 --yes-i-really-mean-it

Oh, that's not good! ...

> Then 3 out of 5 MONs crashed.

What was the crash?

> I was able to keep MON up by making MDSMonitor::maybe_resize_cluster return false directly with gdb. Then I set max_mds back to 2. Now my MONs does not crash.
>
>
>
> I’ve really learnt a lesson from this..
>
> Now I suppose I need to figure out how to undo the “mds rmfailed” command?

There's no CLI to add the ranks back into the failed set. You may be
able to reset your FSMap using `ceph fs reset` but this should be a
last resort as it's not well tested with multiple ranks (you have rank
0 and 1). It's likely you'd lose metadata.

I will compile an addfailed command in a branch but you'll need to
download the packages and run it. Please be careful running
hidden/debugging commands.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx