Need help : MDS cluster completely dead !

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/05/2014 02:16 PM, Yan, Zheng wrote:
> On Fri, Sep 5, 2014 at 4:05 PM, Florent Bautista <florent at coppint.com> wrote:
>> Firefly :) last release.
>>
>> After few days, second MDS is still "stopping" and consuming CPU
>> sometimes... :)
> Try restarting the stopping MDS and run "ceph mds stop 1" again.

"service ceph stop mds" does nothing.

"ceph mds stop 1" returns "Error EEXIST: mds.1 not active (up:stopping)"

>
>> On 09/04/2014 09:13 AM, Yan, Zheng wrote:
>>> which version of MDS are you using?
>>>
>>> On Wed, Sep 3, 2014 at 10:48 PM, Florent Bautista <florent at coppint.com> wrote:
>>>> Hi John and thank you for your answer.
>>>>
>>>> I "solved" the problem doing : ceph mds stop 1
>>>>
>>>> So one MDS is marked as "stopping". A few hours later, it is still
>>>> "stopping" (active process, consuming CPU sometimes).
>>>>
>>>> So the other seems to respond fine to clients...
>>>>
>>>> Multi-MDS is really really really unstable :-D
>>>>
>>>> On 09/03/2014 04:00 PM, John Spray wrote:
>>>>> Hi Florent,
>>>>>
>>>>> The first thing to do is to turn up the logging on the MDS (if you
>>>>> haven't already) -- set "debug mds = 20"
>>>>> http://ceph.com/docs/master/rados/troubleshooting/log-and-debug/#subsystem-log-and-debug-settings
>>>>>
>>>>> Since you say they appear as 'active' in "ceph status", I assume they
>>>>> are running rather than crashing again, but it would be good to log
>>>>> into the MDS servers and check that there really are running ceph-mds
>>>>> processes.  If the MDS daemons are running but apparently
>>>>> unresponsive, you may be able to get a little bit of extra info from
>>>>> the running MDS by doing "ceph daemon mds.<name> <command>", where
>>>>> interesting commands are dump_ops_in_flight, status, objecter_ops
>>>>>
>>>>> Hopefully that will give us some clues.
>>>>>
>>>>> Cheers,
>>>>> John
>>>>>
>>>>> On Wed, Sep 3, 2014 at 11:52 AM, Florent Bautista
>>>>> <bautista.florent at gmail.com> wrote:
>>>>>> Hi everyone,
>>>>>>
>>>>>> I use Ceph Firefly release.
>>>>>>
>>>>>> I had a MDS cluster with only one MDS until yesterday, when I tried to add a
>>>>>> second one to test multi-mds. I thought I could get back to one MDS when I
>>>>>> want, but it seems we can't !
>>>>>>
>>>>>> Both crashed this night, and I am unable to get them back today.
>>>>>>
>>>>>> They appear as active in ceph -s, clients using 3.16 kernel mount it but no
>>>>>> operation can be done : "ls" is freezing, load average of client is climbing
>>>>>> and nothing is done by MDSes (not using CPU, nothing in logs except some
>>>>>> "mdsload" messages and after some time : closing stale session client).
>>>>>>
>>>>>> How can I do to debug this situation and recover my data ?
>>>>>>
>>>>>> Thank you a lot.
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users at lists.ceph.com
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users at lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users at lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux