Hi Patrick,
We continue to hit this bug. Just a couple of questions:
1. I see that http://tracker.ceph.com/issues/16983 has been updated and you believe it is related to http://tracker.ceph.com/issues/16013 . It looks like this fix is scheduled to be backported to Jewel at some point... is there any sense as to when that might happen and a point release made?
2. Looking at the pull request: https://github.com/ceph/ceph/pull/8778 I ran through the testing steps that were posted and was unable to replicate the crash.
3. When we do hit this condition, what is the best way to recover? I can continue to restart the MDS services and reboot the hosts, but the condition remains for some period of time. Even after blacklisting all clients the condition persists. It's actually unclear to me how/why this is recovering at all. If it will be some period of time before the fix is released is there any workaround or temporary solution?
Thanks in advance,
Randy
On Wed, Aug 10, 2016 at 4:38 PM, Randy Orr <randy.orr@xxxxxxxxxx> wrote:
Patrick,We are using the kernel client. We have a mix of 4.4 and 3.19 kernels on the client side with plans to move away from the 3.19 kernel where/when we can.-RandyOn Wed, Aug 10, 2016 at 4:24 PM, Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:Randy, are you using ceph-fuse or the kernel client (or something else)?
> ______________________________
On Wed, Aug 10, 2016 at 2:33 PM, Randy Orr <randy.orr@xxxxxxxxxx> wrote:
> Great, thank you. Please let me know if I can be of any assistance in
> testing or validating a fix.
>
> -Randy
>
> On Wed, Aug 10, 2016 at 1:21 PM, Patrick Donnelly <pdonnell@xxxxxxxxxx>
> wrote:
>>
>> Hello Randy,
>>
>> On Wed, Aug 10, 2016 at 12:20 PM, Randy Orr <randy.orr@xxxxxxxxxx> wrote:
>> > mds/Locker.cc: In function 'bool Locker::check_inode_max_size(CInode*,
>> > bool,
>> > bool, uint64_t, bool, uint64_t, utime_t)' thread 7fc305b83700 time
>> > 2016-08-09 18:51:50.626630
>> > mds/Locker.cc: 2190: FAILED assert(in->is_file())
>> >
>> > ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x8b) [0x563d1e0a2d3b]
>> > 2: (Locker::check_inode_max_size(CInode*, bool, bool, unsigned long,
>> > bool,
>> > unsigned long, utime_t)+0x15e3) [0x563d1de506a3]
>> > 3: (Server::handle_client_open(std::shared_ptr<MDRequestImpl>&) +0x1061)
>> > [0x563d1dd386a1]
>> > 4:
>> > (Server::dispatch_client_request(std::shared_ptr<MDRequestIm pl>&)+0xa0b)
>> > [0x563d1dd5709b]
>> > 5: (Server::handle_client_request(MClientRequest*)+0x47f)
>> > [0x563d1dd5768f]
>> > 6: (Server::dispatch(Message*)+0x3bb) [0x563d1dd5b8db]
>> > 7: (MDSRank::handle_deferrable_message(Message*)+0x80c)
>> > [0x563d1dce1f8c]
>> > 8: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x563d1dceb081]
>> > 9: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x563d1dcec1d5]
>> > 10: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x563d1dcd3f83]
>> > 11: (DispatchQueue::entry()+0x78b) [0x563d1e1996cb]
>> > 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x563d1e08862d]
>> > 13: (()+0x8184) [0x7fc30bd7c184]
>> > 14: (clone()+0x6d) [0x7fc30a2d337d]
>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> > needed to
>> > interpret this.
>>
>> I have a bug report filed for this issue:
>> http://tracker.ceph.com/issues/16983
>>
>> I believe it should be straightforward to solve and we'll have a fix
>> for it soon.
>>
>> Thanks for the report!
>>
>> --
>> Patrick Donnelly
>
>
>
_________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
Patrick Donnelly
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com