Hello, I have a particular OSD (53), which at random will crash with the OSD process stopping. OS: Debian 8.x CEPH : ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) From the logs at the time of the OSD being marked as crashed I can only see the following: -4> 2017-02-10 23:40:16.820894 7fadbd049700 1 -- 172.16.3.7:6825/16969 <== osd.26 172.16.2.104:0/5812 1 ==== osd_ping(ping e29842 stamp 2017-02$ -3> 2017-02-10 23:40:16.820918 7fadbd049700 1 -- 172.16.3.7:6825/16969 --> 172.16.2.104:0/5812 -- osd_ping(ping_reply e29842 stamp 2017-02-10 2$ -2> 2017-02-10 23:40:16.822436 7faddb149700 1 -- 172.16.2.107:6820/16969 <== client.8222771 172.16.2.2:0/1125091221 86 ==== osd_op(client.82227$ -1> 2017-02-10 23:40:16.822453 7faddb149700 5 -- op tracker -- seq: 670, time: 2017-02-10 23:40:16.822453, event: queued_for_pg, op: osd_op(cli$ 0> 2017-02-10 23:40:16.832241 7fadd0631700 -1 *** Caught signal (Aborted) ** in thread 7fadd0631700 thread_name:tp_osd_tp ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) 1: (()+0x951cc7) [0x5556d8c4bcc7] 2: (()+0xf890) [0x7fadf5f8e890] 3: (gsignal()+0x37) [0x7fadf3fd5067] 4: (abort()+0x148) [0x7fadf3fd6448] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x5556d8d51296] 6: (FileStore::read(coll_t const&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int, bool)+0xd7c) [0x5556d89e68ec] 7: (ReplicatedBackend::objects_read_sync(hobject_t const&, unsigned long, unsigned long, unsigned int, ceph::buffer::list*)+0xcd) [0x5556d885ce7d] 8: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x6355) [0x5556d87f6515] 9: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x61) [0x5556d8802101] 10: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x936) [0x5556d880a566] 11: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x37c3) [0x5556d880f3d3] 12: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x727) [0x5556d87c6ae7] 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x420) [0x5556d866b650] 14: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6a) [0x5556d866b8aa] 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x87a) [0x5556d8687f7a] 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8b6) [0x5556d8d40c56] 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5556d8d42c10] 18: (()+0x8064) [0x7fadf5f87064] 19: (clone()+0x6d) [0x7fadf408862d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Does this relate to anything or do I need to dig deeper to find the issue? ,Ashley |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com