Re: mds crashing

Adam Tygart <mozes@xxxxxxx> · Thu, 16 Apr 2015 08:04:13 -0500

(Adding back to the list)

We've not seen any slow requests near that badly behind. Leading up to
the crash, the furthest behind I saw any request was ~90 seconds. Here
is the cluster log leading up to the mds crashes.
http://people.beocat.cis.ksu.edu/~mozes/ceph-mds-crashes-20150415.log

--
Adam

On Thu, Apr 16, 2015 at 1:35 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
> On Thu, Apr 16, 2015 at 10:44 AM, Adam Tygart <mozes@xxxxxxx> wrote:
>> We did that just after Kyle responded to John Spray above. I am
>> rebuilding the kernel now to include dynamic printk support.
>>
>
> Maybe the first crash was caused by hang request in MDS. Is there
> warnings like "cluster [WRN] slow request [several thousands or more ]
> seconds old, received at ...: client_request(client.734537:23 getattr
> pAsLsXsFs ...) "  in your ceph cluster log.
>
> Regards
> Yan, Zheng
>
>> --
>> Adam
>>
>> On Wed, Apr 15, 2015 at 9:37 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>> On Thu, Apr 16, 2015 at 10:24 AM, Adam Tygart <mozes@xxxxxxx> wrote:
>>>> I don't have "dynamic_debug" enabled in the currently running kernel,
>>>> so I can't bump the verbosity of the ceph functions. I can rebuild the
>>>> kernel and reboot it to enable dynamic_debug, but then we'll have to
>>>> wait for when we re-trigger the bug. Attached is the mdsc file.
>>>>
>>>> As for getting the mds back running, we put a route in the faulty
>>>> client to redirect ceph traffic to the loopback device. Started the
>>>> mds again, waited for the full startup sequence to finish for the mds
>>>> and re-set the normal routing. That seemed to cleanup the existing
>>>> session and allow the mds to live and the client to reconnect. With
>>>> the above mds requests still pending/hung, of course.
>>>
>>> did you do the trick before? the trick leaves the client in ill state.
>>> MDS will crash again after the client sends another 3M requests to it.
>>>
>>> Regards
>>> Yan, Zheng
>>>
>>>>
>>>> --
>>>> Adam
>>>>
>>>> On Wed, Apr 15, 2015 at 9:04 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>>>> On Thu, Apr 16, 2015 at 9:48 AM, Adam Tygart <mozes@xxxxxxx> wrote:
>>>>>> What is significantly smaller? We have 67 requests in the 16,400,000
>>>>>> range and 250 in the 18,900,000 range.
>>>>>>
>>>>>
>>>>> that explains the crash. could you help me to debug this issue.
>>>>>
>>>>>  send /sys/kernel/debug/ceph/*/mdsc to me.
>>>>>
>>>>>  run "echo module ceph +p > /sys/kernel/debug/dynamic_debug/control"
>>>>> on the cephfs mount machine
>>>>>  restart the mds and wait until it crash again
>>>>>  run "echo module ceph -p > /sys/kernel/debug/dynamic_debug/control"
>>>>> on the cephfs mount machine
>>>>>  send kernel message of the cephfs mount machine to me (should in
>>>>> /var/log/kerne.log or /var/log/message)
>>>>>
>>>>> to recover from the crash. you can either force reset the machine
>>>>> contains cephfs mount or add "mds wipe sessions = 1" to mds section of
>>>>> ceph.conf
>>>>>
>>>>> Regards
>>>>> Yan, Zheng
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Adam
>>>>>>
>>>>>> On Wed, Apr 15, 2015 at 8:38 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>>>>>> On Thu, Apr 16, 2015 at 9:07 AM, Adam Tygart <mozes@xxxxxxx> wrote:
>>>>>>>> We are using 3.18.6-gentoo. Based on that, I was hoping that the
>>>>>>>> kernel bug referred to in the bug report would have been fixed.
>>>>>>>>
>>>>>>>
>>>>>>> The bug was supposed to be fixed, but you hit the bug again. could you
>>>>>>> check if the kernel client has any hang mds request. (check
>>>>>>> /sys/kernel/debug/ceph/*/mdsc on the machine that contain cephfs
>>>>>>> mount. If there is any request whose ID is significant smaller than
>>>>>>> other requests' IDs)
>>>>>>>
>>>>>>> Regards
>>>>>>> Yan, Zheng
>>>>>>>
>>>>>>>> --
>>>>>>>> Adam
>>>>>>>>
>>>>>>>> On Wed, Apr 15, 2015 at 8:02 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>>>>>>>> On Thu, Apr 16, 2015 at 5:29 AM, Kyle Hutson <kylehutson@xxxxxxx> wrote:
>>>>>>>>>> Thank you, John!
>>>>>>>>>>
>>>>>>>>>> That was exactly the bug we were hitting. My Google-fu didn't lead me to
>>>>>>>>>> this one.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> here is the bug report http://tracker.ceph.com/issues/10449. It's a
>>>>>>>>> kernel client bug which causes the session map size increase
>>>>>>>>> infinitely. which version of linux kernel are using?
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Yan, Zheng
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 15, 2015 at 4:16 PM, John Spray <john.spray@xxxxxxxxxx> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 15/04/2015 20:02, Kyle Hutson wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I upgraded to 0.94.1 from 0.94 on Monday, and everything had been going
>>>>>>>>>>>> pretty well.
>>>>>>>>>>>>
>>>>>>>>>>>> Then, about noon today, we had an mds crash. And then the failover mds
>>>>>>>>>>>> crashed. And this cascaded through all 4 mds servers we have.
>>>>>>>>>>>>
>>>>>>>>>>>> If I try to start it ('service ceph start mds' on CentOS 7.1), it appears
>>>>>>>>>>>> to be OK for a little while. ceph -w goes through 'replay' 'reconnect'
>>>>>>>>>>>> 'rejoin' 'clientreplay' and 'active' but nearly immediately after getting to
>>>>>>>>>>>> 'active', it crashes again.
>>>>>>>>>>>>
>>>>>>>>>>>> I have the mds log at
>>>>>>>>>>>> http://people.beocat.cis.ksu.edu/~kylehutson/ceph-mds.hobbit01.log
>>>>>>>>>>>> <http://people.beocat.cis.ksu.edu/%7Ekylehutson/ceph-mds.hobbit01.log>
>>>>>>>>>>>>
>>>>>>>>>>>> For the possibly, but not necessarily, useful background info.
>>>>>>>>>>>> - Yesterday we took our erasure coded pool and increased both pg_num and
>>>>>>>>>>>> pgp_num from 2048 to 4096. We still have several objects misplaced (~17%),
>>>>>>>>>>>> but those seem to be continuing to clean themselves up.
>>>>>>>>>>>> - We are in the midst of a large (300+ TB) rsync from our old (non-ceph)
>>>>>>>>>>>> filesystem to this filesystem.
>>>>>>>>>>>> - Before we realized the mds crashes, we had just changed the size of our
>>>>>>>>>>>> metadata pool from 2 to 4.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It looks like you're seeing http://tracker.ceph.com/issues/10449, which is
>>>>>>>>>>> a situation where the SessionMap object becomes too big for the MDS to
>>>>>>>>>>> save.The cause of it in that case was stuck requests from a misbehaving
>>>>>>>>>>> client running a slightly older kernel.
>>>>>>>>>>>
>>>>>>>>>>> Assuming you're using the kernel client and having a similar problem, you
>>>>>>>>>>> could try to work around this situation by forcibly unmounting the clients
>>>>>>>>>>> while the MDS is offline, such that during clientreplay the MDS will remove
>>>>>>>>>>> them from the SessionMap after timing out, and then next time it tries to
>>>>>>>>>>> save the map it won't be oversized.  If that works, you could then look into
>>>>>>>>>>> getting newer kernels on the clients to avoid hitting the issue again -- the
>>>>>>>>>>> #10449 ticket has some pointers about which kernel changes were relevant.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> ceph-users mailing list
>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> ceph-users mailing list
>>>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com