Re: MDS crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 16, 2016 at 6:29 AM, Randy Orr <randy.orr@xxxxxxxxxx> wrote:
> Hi Patrick,
>
> We continue to hit this bug. Just a couple of questions:
>
> 1. I see that http://tracker.ceph.com/issues/16983 has been updated and you
> believe it is related to http://tracker.ceph.com/issues/16013. It looks like
> this fix is scheduled to be backported to Jewel at some point... is there
> any sense as to when that might happen and a point release made?
>
> 2. Looking at the pull request: https://github.com/ceph/ceph/pull/8778 I ran
> through the testing steps that were posted and was unable to replicate the
> crash.
>
> 3. When we do hit this condition, what is the best way to recover? I can
> continue to restart the MDS services and reboot the hosts, but the condition
> remains for some period of time. Even after blacklisting all clients the
> condition persists. It's actually unclear to me how/why this is recovering
> at all. If it will be some period of time before the fix is released is
> there any workaround or temporary solution?
>

The easiest solution is replace ceph-mds daemon with the patched
version. Using fuse client can also avoid this issue (because only
kernel client can trigger this bug).

Regards
Yan, Zheng

> Thanks in advance,
> Randy
>
> On Wed, Aug 10, 2016 at 4:38 PM, Randy Orr <randy.orr@xxxxxxxxxx> wrote:
>>
>> Patrick,
>>
>> We are using the kernel client. We have a mix of 4.4 and 3.19 kernels on
>> the client side with plans to move away from the 3.19 kernel where/when we
>> can.
>>
>> -Randy
>>
>> On Wed, Aug 10, 2016 at 4:24 PM, Patrick Donnelly <pdonnell@xxxxxxxxxx>
>> wrote:
>>>
>>> Randy, are you using ceph-fuse or the kernel client (or something else)?
>>>
>>> On Wed, Aug 10, 2016 at 2:33 PM, Randy Orr <randy.orr@xxxxxxxxxx> wrote:
>>> > Great, thank you. Please let me know if I can be of any assistance in
>>> > testing or validating a fix.
>>> >
>>> > -Randy
>>> >
>>> > On Wed, Aug 10, 2016 at 1:21 PM, Patrick Donnelly <pdonnell@xxxxxxxxxx>
>>> > wrote:
>>> >>
>>> >> Hello Randy,
>>> >>
>>> >> On Wed, Aug 10, 2016 at 12:20 PM, Randy Orr <randy.orr@xxxxxxxxxx>
>>> >> wrote:
>>> >> > mds/Locker.cc: In function 'bool
>>> >> > Locker::check_inode_max_size(CInode*,
>>> >> > bool,
>>> >> > bool, uint64_t, bool, uint64_t, utime_t)' thread 7fc305b83700 time
>>> >> > 2016-08-09 18:51:50.626630
>>> >> > mds/Locker.cc: 2190: FAILED assert(in->is_file())
>>> >> >
>>> >> >  ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
>>> >> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> >> > const*)+0x8b) [0x563d1e0a2d3b]
>>> >> >  2: (Locker::check_inode_max_size(CInode*, bool, bool, unsigned
>>> >> > long,
>>> >> > bool,
>>> >> > unsigned long, utime_t)+0x15e3) [0x563d1de506a3]
>>> >> >  3:
>>> >> > (Server::handle_client_open(std::shared_ptr<MDRequestImpl>&)+0x1061)
>>> >> > [0x563d1dd386a1]
>>> >> >  4:
>>> >> >
>>> >> > (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0xa0b)
>>> >> > [0x563d1dd5709b]
>>> >> >  5: (Server::handle_client_request(MClientRequest*)+0x47f)
>>> >> > [0x563d1dd5768f]
>>> >> >  6: (Server::dispatch(Message*)+0x3bb) [0x563d1dd5b8db]
>>> >> >  7: (MDSRank::handle_deferrable_message(Message*)+0x80c)
>>> >> > [0x563d1dce1f8c]
>>> >> >  8: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x563d1dceb081]
>>> >> >  9: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x563d1dcec1d5]
>>> >> >  10: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x563d1dcd3f83]
>>> >> >  11: (DispatchQueue::entry()+0x78b) [0x563d1e1996cb]
>>> >> >  12: (DispatchQueue::DispatchThread::entry()+0xd) [0x563d1e08862d]
>>> >> >  13: (()+0x8184) [0x7fc30bd7c184]
>>> >> >  14: (clone()+0x6d) [0x7fc30a2d337d]
>>> >> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> >> > needed to
>>> >> > interpret this.
>>> >>
>>> >> I have a bug report filed for this issue:
>>> >> http://tracker.ceph.com/issues/16983
>>> >>
>>> >> I believe it should be straightforward to solve and we'll have a fix
>>> >> for it soon.
>>> >>
>>> >> Thanks for the report!
>>> >>
>>> >> --
>>> >> Patrick Donnelly
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > ceph-users mailing list
>>> > ceph-users@xxxxxxxxxxxxxx
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>>
>>>
>>>
>>> --
>>> Patrick Donnelly
>>
>>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux