On 09/05/2014 02:16 PM, Yan, Zheng wrote: > On Fri, Sep 5, 2014 at 4:05 PM, Florent Bautista <florent at coppint.com> wrote: >> Firefly :) last release. >> >> After few days, second MDS is still "stopping" and consuming CPU >> sometimes... :) > Try restarting the stopping MDS and run "ceph mds stop 1" again. "service ceph stop mds" does nothing. "ceph mds stop 1" returns "Error EEXIST: mds.1 not active (up:stopping)" > >> On 09/04/2014 09:13 AM, Yan, Zheng wrote: >>> which version of MDS are you using? >>> >>> On Wed, Sep 3, 2014 at 10:48 PM, Florent Bautista <florent at coppint.com> wrote: >>>> Hi John and thank you for your answer. >>>> >>>> I "solved" the problem doing : ceph mds stop 1 >>>> >>>> So one MDS is marked as "stopping". A few hours later, it is still >>>> "stopping" (active process, consuming CPU sometimes). >>>> >>>> So the other seems to respond fine to clients... >>>> >>>> Multi-MDS is really really really unstable :-D >>>> >>>> On 09/03/2014 04:00 PM, John Spray wrote: >>>>> Hi Florent, >>>>> >>>>> The first thing to do is to turn up the logging on the MDS (if you >>>>> haven't already) -- set "debug mds = 20" >>>>> http://ceph.com/docs/master/rados/troubleshooting/log-and-debug/#subsystem-log-and-debug-settings >>>>> >>>>> Since you say they appear as 'active' in "ceph status", I assume they >>>>> are running rather than crashing again, but it would be good to log >>>>> into the MDS servers and check that there really are running ceph-mds >>>>> processes. If the MDS daemons are running but apparently >>>>> unresponsive, you may be able to get a little bit of extra info from >>>>> the running MDS by doing "ceph daemon mds.<name> <command>", where >>>>> interesting commands are dump_ops_in_flight, status, objecter_ops >>>>> >>>>> Hopefully that will give us some clues. >>>>> >>>>> Cheers, >>>>> John >>>>> >>>>> On Wed, Sep 3, 2014 at 11:52 AM, Florent Bautista >>>>> <bautista.florent at gmail.com> wrote: >>>>>> Hi everyone, >>>>>> >>>>>> I use Ceph Firefly release. >>>>>> >>>>>> I had a MDS cluster with only one MDS until yesterday, when I tried to add a >>>>>> second one to test multi-mds. I thought I could get back to one MDS when I >>>>>> want, but it seems we can't ! >>>>>> >>>>>> Both crashed this night, and I am unable to get them back today. >>>>>> >>>>>> They appear as active in ceph -s, clients using 3.16 kernel mount it but no >>>>>> operation can be done : "ls" is freezing, load average of client is climbing >>>>>> and nothing is done by MDSes (not using CPU, nothing in logs except some >>>>>> "mdsload" messages and after some time : closing stale session client). >>>>>> >>>>>> How can I do to debug this situation and recover my data ? >>>>>> >>>>>> Thank you a lot. >>>>>> >>>>>> _______________________________________________ >>>>>> ceph-users mailing list >>>>>> ceph-users at lists.ceph.com >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users at lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users at lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com