Re: MDS crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Patrick,

We continue to hit this bug. Just a couple of questions:

1. I see that http://tracker.ceph.com/issues/16983 has been updated and you believe it is related to http://tracker.ceph.com/issues/16013. It looks like this fix is scheduled to be backported to Jewel at some point... is there any sense as to when that might happen and a point release made? 

2. Looking at the pull request: https://github.com/ceph/ceph/pull/8778 I ran through the testing steps that were posted and was unable to replicate the crash. 

3. When we do hit this condition, what is the best way to recover? I can continue to restart the MDS services and reboot the hosts, but the condition remains for some period of time. Even after blacklisting all clients the condition persists. It's actually unclear to me how/why this is recovering at all. If it will be some period of time before the fix is released is there any workaround or temporary solution?

Thanks in advance,
Randy

On Wed, Aug 10, 2016 at 4:38 PM, Randy Orr <randy.orr@xxxxxxxxxx> wrote:
Patrick,

We are using the kernel client. We have a mix of 4.4 and 3.19 kernels on the client side with plans to move away from the 3.19 kernel where/when we can.

-Randy 

On Wed, Aug 10, 2016 at 4:24 PM, Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
Randy, are you using ceph-fuse or the kernel client (or something else)?

On Wed, Aug 10, 2016 at 2:33 PM, Randy Orr <randy.orr@xxxxxxxxxx> wrote:
> Great, thank you. Please let me know if I can be of any assistance in
> testing or validating a fix.
>
> -Randy
>
> On Wed, Aug 10, 2016 at 1:21 PM, Patrick Donnelly <pdonnell@xxxxxxxxxx>
> wrote:
>>
>> Hello Randy,
>>
>> On Wed, Aug 10, 2016 at 12:20 PM, Randy Orr <randy.orr@xxxxxxxxxx> wrote:
>> > mds/Locker.cc: In function 'bool Locker::check_inode_max_size(CInode*,
>> > bool,
>> > bool, uint64_t, bool, uint64_t, utime_t)' thread 7fc305b83700 time
>> > 2016-08-09 18:51:50.626630
>> > mds/Locker.cc: 2190: FAILED assert(in->is_file())
>> >
>> >  ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
>> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x8b) [0x563d1e0a2d3b]
>> >  2: (Locker::check_inode_max_size(CInode*, bool, bool, unsigned long,
>> > bool,
>> > unsigned long, utime_t)+0x15e3) [0x563d1de506a3]
>> >  3: (Server::handle_client_open(std::shared_ptr<MDRequestImpl>&)+0x1061)
>> > [0x563d1dd386a1]
>> >  4:
>> > (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0xa0b)
>> > [0x563d1dd5709b]
>> >  5: (Server::handle_client_request(MClientRequest*)+0x47f)
>> > [0x563d1dd5768f]
>> >  6: (Server::dispatch(Message*)+0x3bb) [0x563d1dd5b8db]
>> >  7: (MDSRank::handle_deferrable_message(Message*)+0x80c)
>> > [0x563d1dce1f8c]
>> >  8: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x563d1dceb081]
>> >  9: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x563d1dcec1d5]
>> >  10: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x563d1dcd3f83]
>> >  11: (DispatchQueue::entry()+0x78b) [0x563d1e1996cb]
>> >  12: (DispatchQueue::DispatchThread::entry()+0xd) [0x563d1e08862d]
>> >  13: (()+0x8184) [0x7fc30bd7c184]
>> >  14: (clone()+0x6d) [0x7fc30a2d337d]
>> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> > needed to
>> > interpret this.
>>
>> I have a bug report filed for this issue:
>> http://tracker.ceph.com/issues/16983
>>
>> I believe it should be straightforward to solve and we'll have a fix
>> for it soon.
>>
>> Thanks for the report!
>>
>> --
>> Patrick Donnelly
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
Patrick Donnelly


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux