Re: Ceph 10.2.10 - SegFault in ms_pipe_read

Dyweni - Ceph-Users <6EXbab4FYk8H@xxxxxxxxxx> · Wed, 10 Jan 2018 22:20:08 -0600

I moved the drive from the crashing 10.2.10 OSD node into a different 
10.2.10 OSD and everything is working fine now.

On 2018-01-10 20:42, Dyweni - Ceph-Users wrote:
Hi,

My cluster has 12.2.2 Mons and Mgrs, and 10.2.10 OSDs.

I tried adding a new 12.2.2 OSD into the mix and it crashed (expected).

However, now one of my existing 10.2.10 OSDs is crashing.  I've not
had any issues with the 10.2.10 OSDs to date.

What is strange, is that both the 10.2.10 and 12.2.2 OSD crashes occur
in the ms_pipe_read thread.

Also strange, is that this crash appears to occur during recovery...
I have two other OSDs, which if on, cause this OSD to crash.  If those
OSDs are off, this OSD does not crash.  Other than for recovery, my
cluster is completely idle.

Any ideas for troubleshooting / resolving?

Thread 73 "ms_pipe_read" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x6fefea20 (LWP 3913)]
0xb6b14f08 in std::_Rb_tree_increment(std::_Rb_tree_node_base const*)
() from
/usr/lib/gcc/armv7a-hardfloat-linux-gnueabi/5.4.0/libstdc++.so.6
(gdb) bt
#0  0xb6b14f08 in std::_Rb_tree_increment(std::_Rb_tree_node_base
const*) () from
/usr/lib/gcc/armv7a-hardfloat-linux-gnueabi/5.4.0/libstdc++.so.6
#1  0x0082d5d8 in std::_Rb_tree<unsigned long long, std::pair<unsigned
long long const, unsigned long long>,
std::_Select1st<std::pair<unsigned long long const, unsigned long
long> >, std::less<unsigned long long>,
std::allocator<std::pair<unsigned long long const, unsigned long long>
> >::_M_get_insert_hint_unique_pos(std::_Rb_tree_const_iterator<std::pair<unsigned long long const, unsigned long long> >, unsigned long long const&) ()
#2  0x008e32fc in std::_Rb_tree_iterator<std::pair<unsigned long long
const, unsigned long long> > std::_Rb_tree<unsigned long long,
std::pair<unsigned long long const, unsigned long long>,
std::_Select1st<std::pair<unsigned long long const, unsigned long
long> >, std::less<unsigned long long>,
std::allocator<std::pair<unsigned long long const, unsigned long long>
> >::_M_emplace_hint_unique<std::piecewise_construct_t const&, std::tuple<unsigned long long const&>, std::tuple<> >(std::_Rb_tree_const_iterator<std::pair<unsigned long long const, unsigned long long> >, std::piecewise_construct_t const&, std::tuple<unsigned long long const&>&&, std::tuple<>&&) ()
#3  0x009db0c0 in PushOp::decode(ceph::buffer::list::iterator&) ()
#4  0x009297a8 in MOSDPGPush::decode_payload() ()
#5  0x00d0d3dc in decode_message(CephContext*, int, ceph_msg_header&,
ceph_msg_footer&, ceph::buffer::list&, ceph::buffer::list&,
ceph::buffer::list&) ()
#6  0x00ea01b8 in Pipe::read_message(Message**, AuthSessionHandler*) ()
#7  0x00eaa44c in Pipe::reader() ()
#8  0x00eb2acc in Pipe::Reader::entry() ()
#9  0xb6e1a890 in start_thread () from /lib/libpthread.so.0
#10 0xb6978408 in ?? () from /lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt 
stack?)
(gdb) frame

Thanks,
Dyweni

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com