Segfaults on 12.2.9 and 12.2.8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Ceph users,

 

I am chasing an issue that is affecting one of our clusters across various Nodes / OSDs. The cluster is around 150 OSDs / 9 nodes and running Ceph 12.2.8 and 12.2.9.

 

We have three ceph clusters around this size, two are Centos 7.5 based and one is Ubuntu 16.04. This segfault only occurs on our Ubuntu cluster. It is happening across different hardware configs / drive types ( brand, SSD / HDD etc ).

 

Does anyone have any ideas to try and track this down?

 

Below is a snip from the OSD log

-------------------------------------------------------------------------------------------

 

   -11> 2019-01-14 06:11:03.317623 7fec6e596700  5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 109398'1041726, trimmed: , trimmed_dups: , clear_divergent_priors: 0

   -10> 2019-01-14 06:11:03.317779 7fec6e596700  5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 109398'1041727, trimmed: , trimmed_dups: , clear_divergent_priors: 0

    -9> 2019-01-14 06:11:03.317944 7fec79dad700  1 -- 10.4.36.38:6808/547667 --> 10.4.36.36:6804/5411 -- osd_repop_reply(osd.131.0:32020224 1.917 e109398/105174 ondisk, result = 0) v2 -- 0x5556f3abe780 con 0

    -8> 2019-01-14 06:11:03.318221 7fec79dad700  1 -- 10.4.36.38:6808/547667 --> 10.4.36.36:6804/5411 -- osd_repop_reply(osd.131.0:32020225 1.917 e109398/105174 ondisk, result = 0) v2 -- 0x55569f765700 con 0

    -7> 2019-01-14 06:11:03.318293 7fec79dad700  1 -- 10.4.36.38:6808/547667 --> 10.4.36.36:6804/5411 -- osd_repop_reply(osd.131.0:32020226 1.917 e109398/105174 ondisk, result = 0) v2 -- 0x5556a709a300 con 0

    -6> 2019-01-14 06:11:03.318342 7fec87db3700  5 -- 10.4.36.38:6808/547667 >> 10.4.36.36:6804/5411 conn(0x5556a0a64000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=26010 cs=1 l=0). rx osd.131 seq 354167 0x5557575ece00 os

d_repop(osd.131.0:32020228 1.917 e109398/105174) v2

    -5> 2019-01-14 06:11:03.318369 7fec87db3700  1 -- 10.4.36.38:6808/547667 <== osd.131 10.4.36.36:6804/5411 354167 ==== osd_repop(osd.131.0:32020228 1.917 e109398/105174) v2 ==== 1060+0+532 (3817260254 0 922584117) 0x5557575ece00

con 0x5556a0a64000

    -4> 2019-01-14 06:11:03.318587 7fec6e596700  5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 109398'1041728, trimmed: , trimmed_dups: , clear_divergent_priors: 0

    -3> 2019-01-14 06:11:03.322142 7fec87db3700  5 -- 10.4.36.38:6808/547667 >> 10.4.36.31:6834/14584 conn(0x5556a0ca2800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=20551 cs=1 l=0). rx osd.31 seq 1114772 0x55571b85ee00 o

sd_repop(client.229590.0:782182967 1.3ca e109398/105174) v2

    -2> 2019-01-14 06:11:03.322165 7fec87db3700  1 -- 10.4.36.38:6808/547667 <== osd.31 10.4.36.31:6834/14584 1114772 ==== osd_repop(client.229590.0:782182967 1.3ca e109398/105174) v2 ==== 1048+0+5897 (3029422256 0 514289860) 0x555

71b85ee00 con 0x5556a0ca2800

    -1> 2019-01-14 06:11:03.322308 7fec6f598700  5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 109398'21605410, trimmed: , trimmed_dups: , clear_divergent_priors: 0

     0> 2019-01-14 06:11:03.326962 7fec6e596700 -1 *** Caught signal (Segmentation fault) **

in thread 7fec6e596700 thread_name:tp_osd_tp

 

ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)

1: (()+0xa985a4) [0x55565edc45a4]

2: (()+0x11390) [0x7fec8af2a390]

3: (operator new[](unsigned long)+0xc6) [0x7fec8bd1cff6]

4: (BlueStore::Collection::open_shared_blob(unsigned long, boost::intrusive_ptr<BlueStore::Blob>)+0x3ca) [0x55565ec6875a]

5: (BlueStore::ExtentMap::decode_spanning_blobs(ceph::buffer::ptr::iterator&)+0x385) [0x55565ec702b5]

6: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x6c9) [0x55565ec80de9]

7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x918) [0x55565ecb7b88]

8: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x52e) [0x55565ecb9f2e]

9: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x66) [0x55565e9d96a6]

10: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xc44) [0x55565eb04b24]

11: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x294) [0x55565eb0dd04]

12: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x55565ea17f00]

13: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x543) [0x55565e97aec3]

14: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a9) [0x55565e7eb999]

15: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x55565ea9d577]

16: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1047) [0x55565e819db7]

17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x55565ee0c1a4]

18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55565ee0f1e0]

19: (()+0x76ba) [0x7fec8af206ba]

20: (clone()+0x6d) [0x7fec89f9741d]

NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

 

Kind regards,

Glen Baars

This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux