Re: mds crashing

Markus Blank-Burian <burian@xxxxxxxxxxx> · Tue, 19 May 2015 14:18:39 +0200

I actually managed to reboot everything today and it ran smoothly for
the last few minutes. MDS failover also worked without problems. If
anything bad happens in the next days, I will let you know.

Markus

On Tue, May 19, 2015 at 1:12 PM, Markus Blank-Burian <burian@xxxxxxxxxxx> wrote:
> Thanks for the patch! Testing might take up to a week, since I have to
> reboot all the client nodes in the computing cluster.
>
> On Tue, May 19, 2015 at 12:27 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>> could you try the attached patch
>>
>> On Tue, May 19, 2015 at 5:10 PM, Markus Blank-Burian <burian@xxxxxxxxxxx> wrote:
>>> Forgot the attachments. Besides, is there any way to get the cluster
>>> running again without restarting all client nodes?
>>>
>>> On Tue, May 19, 2015 at 10:45 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>>> On Tue, May 19, 2015 at 4:31 PM, Markus Blank-Burian <burian@xxxxxxxxxxx> wrote:
>>>>> I am afraid, I hit the same bug. Giant worked fine, but after upgrading to
>>>>> hammer (0.94.1) and putting some load on it, the MDSs eventually crashed and
>>>>> now I am stuck in clientreplay most of the time. I am also using the cephfs
>>>>> kernel client (3.18.y). As I didn't find a corresponding tracker entry .. is
>>>>> there already a patch available?
>>>>>
>>>>
>>>> Please send mds log and /sys/kernel/debug/ceph/*/mdsc on client
>>>> machine to us. Besides, Is there warnings like "cluster [WRN] slow
>>>> request [several thousands or more ] seconds old, received at ...:
>>>> client_request(client.734537:23 ...) "  in your ceph cluster log.
>>>>
>>>> Regards
>>>> Yan, Zheng
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com