OSD Repeated Failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

 

I have a particular OSD (53), which at random will crash with the OSD process stopping.

 

OS: Debian 8.x

CEPH : ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

 

From the logs at the time of the OSD being marked as crashed I can only see the following:

 

    -4> 2017-02-10 23:40:16.820894 7fadbd049700  1 -- 172.16.3.7:6825/16969 <== osd.26 172.16.2.104:0/5812 1 ==== osd_ping(ping e29842 stamp 2017-02$

    -3> 2017-02-10 23:40:16.820918 7fadbd049700  1 -- 172.16.3.7:6825/16969 --> 172.16.2.104:0/5812 -- osd_ping(ping_reply e29842 stamp 2017-02-10 2$

    -2> 2017-02-10 23:40:16.822436 7faddb149700  1 -- 172.16.2.107:6820/16969 <== client.8222771 172.16.2.2:0/1125091221 86 ==== osd_op(client.82227$

    -1> 2017-02-10 23:40:16.822453 7faddb149700  5 -- op tracker -- seq: 670, time: 2017-02-10 23:40:16.822453, event: queued_for_pg, op: osd_op(cli$

     0> 2017-02-10 23:40:16.832241 7fadd0631700 -1 *** Caught signal (Aborted) **

in thread 7fadd0631700 thread_name:tp_osd_tp

 

ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

1: (()+0x951cc7) [0x5556d8c4bcc7]

2: (()+0xf890) [0x7fadf5f8e890]

3: (gsignal()+0x37) [0x7fadf3fd5067]

4: (abort()+0x148) [0x7fadf3fd6448]

5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x5556d8d51296]

6: (FileStore::read(coll_t const&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int, bool)+0xd7c) [0x5556d89e68ec]

7: (ReplicatedBackend::objects_read_sync(hobject_t const&, unsigned long, unsigned long, unsigned int, ceph::buffer::list*)+0xcd) [0x5556d885ce7d]

8: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x6355) [0x5556d87f6515]

9: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x61) [0x5556d8802101]

10: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x936) [0x5556d880a566]

11: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x37c3) [0x5556d880f3d3]

12: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x727) [0x5556d87c6ae7]

13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x420) [0x5556d866b650]

14: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6a) [0x5556d866b8aa]

15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x87a) [0x5556d8687f7a]

16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8b6) [0x5556d8d40c56]

17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5556d8d42c10]

18: (()+0x8064) [0x7fadf5f87064]

19: (clone()+0x6d) [0x7fadf408862d]

NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

 

 

Does this relate to anything or do I need to dig deeper to find the issue?

 

,Ashley

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux