CephFS MDS server stuck in "resolve" state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Currently i'm running Ceph Luminous 12.2.5.

This morning I tried running Multi MDS with:
ceph fs set <fs_name> max_mds 2

I have 5 MDS servers. After running above command,
I had 2 active MDSs, 2 standby-active and 1 standby.

And after trying a failover on one of the active MDSs, a standby-active did a replay but crashed (laggy or crashed). Memory and CPU went sky high on the MDS and was unresponsive after some time. I ended up with the one active MDS but got stuck with a degraded filesystem and warning messages about MDS behind on trimming.

I never got any additional MDS active since then. I tried restarting the last active MDS (because the filesystem was becoming unresponsive and had a load of slow requets) and it never got passed replay -> resolve. My MDS cluster still isn't active... :(

What is the "resolve" state? I have never seen that before pre-Luminous.
Debug on 20 doesn't give me much.

Also tried removing the Multi MDS setup, but my CephFS cluster won't go active. How can I get my CephFS up and running again in an active state.

Please help.


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux