Yeah, I need the actual commit IDs please so I make sure I'm looking at the right place. Checking out kraken on my box isn't showing anything sensible with your backtraces. ;) Also your timeouts make me think something else has gone wrong for you. We *do* test with clients that have differing versions, although I'm not sure how complete the matrix is. On Fri, Jul 7, 2017 at 1:43 PM, Noah Watkins <noahwatkins@xxxxxxxxx> wrote: > So far the summary of pass/fail from messing around with the test matrix is: > > - Everything is installed fresh from download.ceph.com > > - The OSD in all tests is: > Ubuntu Xenial + Latest Luminous > > - Client Setup 1: > > Client: jessie, trusty, xenial, yakkety, zesty + Kraken > - buffer error from above > Client: centos7, fedora 23 + Kraken > - timeouts on connect > Client: fedora 24, 25 + Kraken > - everything works fine > > - Client Setup 2: > > Same as setup 1 but used Luminous in Debian/Ubuntu clients. All errors > went away. So only the timeouts remain on cento7 and fedora 23. > > - Client Setup 3: > > Same as setup 1 but used Luminous in rhel/centos/fedora and centos7 > and fedora 23 still timeout. > > We haven't really messed around with our CI matrix setup, so this all > started sometime in the last month or so. > > On Fri, Jul 7, 2017 at 1:09 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >> Exactly what version/release is this? >> >> And are you saying everything works fine on newish Fedora nodes, but >> you get timeouts on Centos7 and that failure on Debian derivatives? >> That is super weird. >> >> On Fri, Jul 7, 2017 at 11:42 AM, Noah Watkins <noahwatkins@xxxxxxxxx> wrote: >>> An important thing I noticed: >>> >>> Fedora 24/25: no errors >>> CentOS 7 and Fedora 23: cluster.connect => -ETIMEDOUT >>> All debian/ubuntu: above error: ceph::buffer::end_of_buffer >>> >>> On Fri, Jul 7, 2017 at 11:39 AM, Noah Watkins <noahwatkins@xxxxxxxxx> wrote: >>>> I'm getting osd map decoding errors on cluster connection from a >>>> kraken client with a luminous backend. I searched around the mailing >>>> list but didn't see this reported, or any note on backwards >>>> incompatibility. Back trace: >>>> >>>> === >>>> >>>> terminate called after throwing an instance of 'ceph::buffer::end_of_buffer' >>>> what(): buffer::end_of_buffer >>>> >>>> Thread 9 "ms_dispatch" received signal SIGABRT, Aborted. >>>> [Switching to Thread 0x7fffd3fff700 (LWP 43060)] >>>> 0x00007fffee73c428 in __GI_raise (sig=sig@entry=6) at >>>> ../sysdeps/unix/sysv/linux/raise.c:54 >>>> 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >>>> (gdb) bt >>>> #0 0x00007fffee73c428 in __GI_raise (sig=sig@entry=6) at >>>> ../sysdeps/unix/sysv/linux/raise.c:54 >>>> #1 0x00007fffee73e02a in __GI_abort () at abort.c:89 >>>> #2 0x00007fffeed7684d in __gnu_cxx::__verbose_terminate_handler() () >>>> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >>>> #3 0x00007fffeed746b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >>>> #4 0x00007fffeed74701 in std::terminate() () from >>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >>>> #5 0x00007fffeed74919 in __cxa_throw () from >>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >>>> #6 0x00007fffef49cbb2 in ceph::buffer::ptr::iterator::get_pos_add >>>> (this=<optimized out>, this=<optimized out>, n=4) at >>>> /home/nwatkins/ceph/src/include/buffer.h:196 >>>> #7 0x00007fffef4b5509 in ceph::buffer::ptr::iterator::get_pos_add >>>> (this=<synthetic pointer>, this=<synthetic pointer>, n=4) at >>>> /usr/include/c++/5/bits/stl_vector.h:676 >>>> #8 denc_traits<unsigned int, void>::decode (p=<synthetic pointer>, >>>> o=<optimized out>) at /home/nwatkins/ceph/src/include/denc.h:224 >>>> #9 denc<unsigned int, denc_traits<unsigned int, void> > (features=0, >>>> p=<synthetic pointer>, o=<optimized out>) at >>>> /home/nwatkins/ceph/src/include/denc.h:496 >>>> #10 denc_traits<std::vector<unsigned int, std::allocator<unsigned int> >>>>>, void>::decode (p=<synthetic pointer>, s=std::vector of length >>>> 16777216, capacity 16777216 = {...}) >>>> at /home/nwatkins/ceph/src/include/denc.h:796 >>>> #11 decode<std::vector<unsigned int, std::allocator<unsigned int> >, >>>> denc_traits<std::vector<unsigned int, std::allocator<unsigned int> >, >>>> void> > ( >>>> o=std::vector of length 16777216, capacity 16777216 = {...}, >>>> p=...) at /home/nwatkins/ceph/src/include/denc.h:1157 >>>> #12 0x00007fffef4aee0c in OSDMap::decode (this=this@entry=0x817f20, >>>> bl=...) at /home/nwatkins/ceph/src/osd/OSDMap.cc:2144 >>>> #13 0x00007fffef4b0b2e in OSDMap::decode (this=0x817f20, bl=...) at >>>> /home/nwatkins/ceph/src/osd/OSDMap.cc:1991 >>>> #14 0x00007fffef387d2b in Objecter::handle_osd_map >>>> (this=this@entry=0x8178a0, m=m@entry=0x7fffd80012e0) at >>>> /home/nwatkins/ceph/src/osdc/Objecter.cc:1242 >>>> #15 0x00007fffef388a87 in Objecter::ms_dispatch (this=0x8178a0, >>>> m=0x7fffd80012e0) at /home/nwatkins/ceph/src/osdc/Objecter.cc:1005 >>>> #16 0x00007fffef5fd8ca in Messenger::ms_deliver_dispatch >>>> (m=0x7fffd80012e0, this=0x789fb0) at >>>> /home/nwatkins/ceph/src/msg/Messenger.h:593 >>>> #17 DispatchQueue::entry (this=0x78a128) at >>>> /home/nwatkins/ceph/src/msg/DispatchQueue.cc:197 >>>> #18 0x00007fffef47968d in DispatchQueue::DispatchThread::entry >>>> (this=<optimized out>) at >>>> /home/nwatkins/ceph/src/msg/DispatchQueue.h:103 >>>> #19 0x00007fffef0706ba in start_thread (arg=0x7fffd3fff700) at >>>> pthread_create.c:333 >>>> #20 0x00007fffee80e3dd in clone () at >>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html