On Fri, Sep 17, 2021 at 2:32 PM 胡 玮文 <huww98@xxxxxxxxxxx> wrote: > > Thank you very much. But the mds still don’t go active. Did you run the command I suggested before or after you executed `rmfailed` below? > While trying to resolve this, I run: > > ceph mds rmfailed 0 --yes-i-really-mean-it > > ceph mds rmfailed 1 --yes-i-really-mean-it Oh, that's not good! ... > Then 3 out of 5 MONs crashed. What was the crash? > I was able to keep MON up by making MDSMonitor::maybe_resize_cluster return false directly with gdb. Then I set max_mds back to 2. Now my MONs does not crash. > > > > I’ve really learnt a lesson from this.. > > Now I suppose I need to figure out how to undo the “mds rmfailed” command? There's no CLI to add the ranks back into the failed set. You may be able to reset your FSMap using `ceph fs reset` but this should be a last resort as it's not well tested with multiple ranks (you have rank 0 and 1). It's likely you'd lose metadata. I will compile an addfailed command in a branch but you'll need to download the packages and run it. Please be careful running hidden/debugging commands. -- Patrick Donnelly, Ph.D. He / Him / His Principal Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx