Hi Florent, The first thing to do is to turn up the logging on the MDS (if you haven't already) -- set "debug mds = 20" http://ceph.com/docs/master/rados/troubleshooting/log-and-debug/#subsystem-log-and-debug-settings Since you say they appear as 'active' in "ceph status", I assume they are running rather than crashing again, but it would be good to log into the MDS servers and check that there really are running ceph-mds processes. If the MDS daemons are running but apparently unresponsive, you may be able to get a little bit of extra info from the running MDS by doing "ceph daemon mds.<name> <command>", where interesting commands are dump_ops_in_flight, status, objecter_ops Hopefully that will give us some clues. Cheers, John On Wed, Sep 3, 2014 at 11:52 AM, Florent Bautista <bautista.florent at gmail.com> wrote: > Hi everyone, > > I use Ceph Firefly release. > > I had a MDS cluster with only one MDS until yesterday, when I tried to add a > second one to test multi-mds. I thought I could get back to one MDS when I > want, but it seems we can't ! > > Both crashed this night, and I am unable to get them back today. > > They appear as active in ceph -s, clients using 3.16 kernel mount it but no > operation can be done : "ls" is freezing, load average of client is climbing > and nothing is done by MDSes (not using CPU, nothing in logs except some > "mdsload" messages and after some time : closing stale session client). > > How can I do to debug this situation and recover my data ? > > Thank you a lot. > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >