Hi,
Currently i'm running Ceph Luminous 12.2.5.
This morning I tried running Multi MDS with:
ceph fs set <fs_name> max_mds 2
I have 5 MDS servers. After running above command,
I had 2 active MDSs, 2 standby-active and 1 standby.
And after trying a failover on one
of the active MDSs, a standby-active did a replay but crashed (laggy or
crashed). Memory and CPU went sky high on the MDS and was unresponsive
after some time. I ended up with the one active MDS but got stuck with a
degraded filesystem and warning messages about MDS behind on trimming.
I never got any additional MDS active since then. I tried restarting the
last active MDS (because the filesystem was becoming unresponsive and had
a load of slow requets) and it never got passed replay -> resolve. My MDS
cluster still isn't active... :(
What is the "resolve" state? I have never seen that before pre-Luminous.
Debug on 20 doesn't give me much.
Also tried removing the Multi MDS setup, but my CephFS cluster won't go
active. How can I get my CephFS up and running again in an active state.
Please help.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com