Re: kraken <--> luminous osd map decoding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 7, 2017 at 1:47 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> Yeah, I need the actual commit IDs please so I make sure I'm looking
> at the right place. Checking out kraken on my box isn't showing
> anything sensible with your backtraces. ;)

The Kraken version is whatever is here from an hour ago:
https://download.ceph.com/debian-kraken/. The backtrace is from "some
version of kraken", but the errors I'm reporting are all still valid
for kraken taken from download.ceph.com.

> Also your timeouts make me think something else has gone wrong for
> you. We *do* test with clients that have differing versions, although
> I'm not sure how complete the matrix is.

I see. I'll try to reproduce this outside the CI environment.

>
> On Fri, Jul 7, 2017 at 1:43 PM, Noah Watkins <noahwatkins@xxxxxxxxx> wrote:
>> So far the summary of pass/fail from messing around with the test matrix is:
>>
>> - Everything is installed fresh from download.ceph.com
>>
>> - The OSD in all tests is:
>>   Ubuntu Xenial + Latest Luminous
>>
>> - Client Setup 1:
>>
>> Client: jessie, trusty, xenial, yakkety, zesty + Kraken
>>   - buffer error from above
>> Client: centos7, fedora 23 + Kraken
>>   - timeouts on connect
>> Client: fedora 24, 25 + Kraken
>>   - everything works fine
>>
>> - Client Setup 2:
>>
>> Same as setup 1 but used Luminous in Debian/Ubuntu clients. All errors
>> went away. So only the timeouts remain on cento7 and fedora 23.
>>
>> - Client Setup 3:
>>
>> Same as setup 1 but used Luminous in rhel/centos/fedora and centos7
>> and fedora 23 still timeout.
>>
>> We haven't really messed around with our CI matrix setup, so this all
>> started sometime in the last month or so.
>>
>> On Fri, Jul 7, 2017 at 1:09 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>>> Exactly what version/release is this?
>>>
>>> And are you saying everything works fine on newish Fedora nodes, but
>>> you get timeouts on Centos7 and that failure on Debian derivatives?
>>> That is super weird.
>>>
>>> On Fri, Jul 7, 2017 at 11:42 AM, Noah Watkins <noahwatkins@xxxxxxxxx> wrote:
>>>> An important thing I noticed:
>>>>
>>>> Fedora 24/25: no errors
>>>> CentOS 7 and Fedora 23: cluster.connect => -ETIMEDOUT
>>>> All debian/ubuntu: above error: ceph::buffer::end_of_buffer
>>>>
>>>> On Fri, Jul 7, 2017 at 11:39 AM, Noah Watkins <noahwatkins@xxxxxxxxx> wrote:
>>>>> I'm getting osd map decoding errors on cluster connection from a
>>>>> kraken client with a luminous backend. I searched around the mailing
>>>>> list but didn't see this reported, or any note on backwards
>>>>> incompatibility. Back trace:
>>>>>
>>>>> ===
>>>>>
>>>>> terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
>>>>>   what():  buffer::end_of_buffer
>>>>>
>>>>> Thread 9 "ms_dispatch" received signal SIGABRT, Aborted.
>>>>> [Switching to Thread 0x7fffd3fff700 (LWP 43060)]
>>>>> 0x00007fffee73c428 in __GI_raise (sig=sig@entry=6) at
>>>>> ../sysdeps/unix/sysv/linux/raise.c:54
>>>>> 54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>>>>> (gdb) bt
>>>>> #0  0x00007fffee73c428 in __GI_raise (sig=sig@entry=6) at
>>>>> ../sysdeps/unix/sysv/linux/raise.c:54
>>>>> #1  0x00007fffee73e02a in __GI_abort () at abort.c:89
>>>>> #2  0x00007fffeed7684d in __gnu_cxx::__verbose_terminate_handler() ()
>>>>> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>>> #3  0x00007fffeed746b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>>> #4  0x00007fffeed74701 in std::terminate() () from
>>>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>>> #5  0x00007fffeed74919 in __cxa_throw () from
>>>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>>> #6  0x00007fffef49cbb2 in ceph::buffer::ptr::iterator::get_pos_add
>>>>> (this=<optimized out>, this=<optimized out>, n=4) at
>>>>> /home/nwatkins/ceph/src/include/buffer.h:196
>>>>> #7  0x00007fffef4b5509 in ceph::buffer::ptr::iterator::get_pos_add
>>>>> (this=<synthetic pointer>, this=<synthetic pointer>, n=4) at
>>>>> /usr/include/c++/5/bits/stl_vector.h:676
>>>>> #8  denc_traits<unsigned int, void>::decode (p=<synthetic pointer>,
>>>>> o=<optimized out>) at /home/nwatkins/ceph/src/include/denc.h:224
>>>>> #9  denc<unsigned int, denc_traits<unsigned int, void> > (features=0,
>>>>> p=<synthetic pointer>, o=<optimized out>) at
>>>>> /home/nwatkins/ceph/src/include/denc.h:496
>>>>> #10 denc_traits<std::vector<unsigned int, std::allocator<unsigned int>
>>>>>>, void>::decode (p=<synthetic pointer>, s=std::vector of length
>>>>> 16777216, capacity 16777216 = {...})
>>>>>     at /home/nwatkins/ceph/src/include/denc.h:796
>>>>> #11 decode<std::vector<unsigned int, std::allocator<unsigned int> >,
>>>>> denc_traits<std::vector<unsigned int, std::allocator<unsigned int> >,
>>>>> void> > (
>>>>>     o=std::vector of length 16777216, capacity 16777216 = {...},
>>>>> p=...) at /home/nwatkins/ceph/src/include/denc.h:1157
>>>>> #12 0x00007fffef4aee0c in OSDMap::decode (this=this@entry=0x817f20,
>>>>> bl=...) at /home/nwatkins/ceph/src/osd/OSDMap.cc:2144
>>>>> #13 0x00007fffef4b0b2e in OSDMap::decode (this=0x817f20, bl=...) at
>>>>> /home/nwatkins/ceph/src/osd/OSDMap.cc:1991
>>>>> #14 0x00007fffef387d2b in Objecter::handle_osd_map
>>>>> (this=this@entry=0x8178a0, m=m@entry=0x7fffd80012e0) at
>>>>> /home/nwatkins/ceph/src/osdc/Objecter.cc:1242
>>>>> #15 0x00007fffef388a87 in Objecter::ms_dispatch (this=0x8178a0,
>>>>> m=0x7fffd80012e0) at /home/nwatkins/ceph/src/osdc/Objecter.cc:1005
>>>>> #16 0x00007fffef5fd8ca in Messenger::ms_deliver_dispatch
>>>>> (m=0x7fffd80012e0, this=0x789fb0) at
>>>>> /home/nwatkins/ceph/src/msg/Messenger.h:593
>>>>> #17 DispatchQueue::entry (this=0x78a128) at
>>>>> /home/nwatkins/ceph/src/msg/DispatchQueue.cc:197
>>>>> #18 0x00007fffef47968d in DispatchQueue::DispatchThread::entry
>>>>> (this=<optimized out>) at
>>>>> /home/nwatkins/ceph/src/msg/DispatchQueue.h:103
>>>>> #19 0x00007fffef0706ba in start_thread (arg=0x7fffd3fff700) at
>>>>> pthread_create.c:333
>>>>> #20 0x00007fffee80e3dd in clone () at
>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux