On Wed, Jun 27, 2018 at 6:16 PM Dennis Kramer (DT) <dennis@xxxxxxxxx> wrote: > > Hi, > > Currently i'm running Ceph Luminous 12.2.5. > > This morning I tried running Multi MDS with: > ceph fs set <fs_name> max_mds 2 > > I have 5 MDS servers. After running above command, > I had 2 active MDSs, 2 standby-active and 1 standby. > > And after trying a failover on one > of the active MDSs, a standby-active did a replay but crashed (laggy or > crashed). Memory and CPU went sky high on the MDS and was unresponsive > after some time. I ended up with the one active MDS but got stuck with a > degraded filesystem and warning messages about MDS behind on trimming. > > I never got any additional MDS active since then. I tried restarting the > last active MDS (because the filesystem was becoming unresponsive and had > a load of slow requets) and it never got passed replay -> resolve. My MDS > cluster still isn't active... :( What is the 'ceph -w' ouput? If you have enabled multi-active mds. All mds ranks need to enter the resolve 'state' before they can continue to recover. > > What is the "resolve" state? I have never seen that before pre-Luminous. > Debug on 20 doesn't give me much. > > Also tried removing the Multi MDS setup, but my CephFS cluster won't go > active. How can I get my CephFS up and running again in an active state. > > Please help. > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com