Re: Luminous RC OSD Crashing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Logged a bug ticket, let me know if need anything further : http://tracker.ceph.com/issues/20687

 

From: Ashley Merrick
Sent: Wednesday, 19 July 2017 8:05 PM
To: ceph-users@xxxxxxxx
Subject: RE: Luminous RC OSD Crashing

 

Also found this error on some of the OSD’s crashing:

 

2017-07-19 12:50:57.587194 7f19348f1700 -1 /build/ceph-12.1.1/src/osd/PrimaryLogPG.cc: In function 'virtual void C_CopyFrom_AsyncReadCb::finish(int)' thread 7f19348f1700 time 2017-07-19 12:50:57.583192

/build/ceph-12.1.1/src/osd/PrimaryLogPG.cc: 7585: FAILED assert(len <= reply_obj.data.length())

 

ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55f1c67bfe32]

2: (C_CopyFrom_AsyncReadCb::finish(int)+0x131) [0x55f1c63ec9e1]

3: (Context::complete(int)+0x9) [0x55f1c626b8b9]

4: (()+0x79bc70) [0x55f1c650fc70]

5: (ECBackend::kick_reads()+0x48) [0x55f1c651f908]

6: (CallClientContexts::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x562) [0x55f1c652e162]

7: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x7f) [0x55f1c650495f]

8: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*, ZTracer::Trace const&)+0x1077) [0x55f1c6519da7]

9: (ECBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x2a6) [0x55f1c651a946]

10: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5e7) [0x55f1c638f667]

11: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) [0x55f1c622fb07]

12: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x55f1c648a0a7]

13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x108c) [0x55f1c625b34c]

14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x93d) [0x55f1c67c5add]

15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55f1c67c7d00]

16: (()+0x8064) [0x7f194cf89064]

17: (clone()+0x6d) [0x7f194c07d62d]

NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

 

--- begin dump of recent events ---

-10000> 2017-07-19 12:50:46.691617 7f194a0ec700  1 -- 172.16.3.3:6806/3482 <== osd.28 172.16.3.4:6800/27027 18606 ==== MOSDECSubOpRead(6.71s2 102354/102344 ECSubRead(tid=605721, to_read={6:8e0c91b4:::rbd_data.61c662238e1f29.000000000000$

-9999> 2017-07-19 12:50:46.692100 7f19330ee700  1 -- 172.16.3.3:6806/3482 --> 172.16.3.4:6800/27027 -- MOSDECSubOpReadReply(6.71s0 102354/102344 ECSubReadReply(tid=605720, attrs_read=0)) v2 -- 0x55f1d5083180 con 0

-9998> 2017-07-19 12:50:46.692388 7f19330ee700  1 -- 172.16.3.3:6806/3482 --> 172.16.3.4:6800/27027 -- MOSDECSubOpReadReply(6.71s0 102354/102344 ECSubReadReply(tid=605721, attrs_read=0)) v2 -- 0x55f2412c1700 con 0

 

,Ashley

 

From: Ashley Merrick
Sent: Wednesday, 19 July 2017 7:08 PM
To: Ashley Merrick <ashley@xxxxxxxxxxxxxx>; ceph-users@xxxxxxxx
Subject: RE: Luminous RC OSD Crashing

 

I have just found : http://tracker.ceph.com/issues/20167

 

Looks to be the same error in an earlier release : 12.0.2-1883-gb3f5819, is marked as resolved one month ago by Sage, however unable to see how and by what. However would guess this fix would have made it to latest RC?

 

,Ashley

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Ashley Merrick
Sent: Wednesday, 19 July 2017 5:47 PM
To: ceph-users@xxxxxxxx
Subject: Luminous RC OSD Crashing

 

Hello,

 

Getting the following on random OSD’s crashing during a backfill/rebuilding on the latest RC, from the log’s so far I have seen the following:

 

172.16.3.10:6802/21760 --> 172.16.3.6:6808/15997 -- pg_update_log_missing(6.19ds12 epoch 101931/101928 rep_tid 59 entries 101931'55683 (0'0) error    6:b984d72a:::rbd_data.a1d870238e1f29.0000000000007c0b:head by client.30604127.0:31963 0.000000 -2) v2 -- 0x55bea0faefc0 con 0

 

log_channel(cluster) log [ERR] : 4.11c required past_interval bounds are empty [101500,100085) but past_intervals is not: ([90726,100084...0083] acting 28)

 

failed to decode message of type 70 v3: buffer::malformed_input: void osd_peer_stat_t::decode(ceph::buffer::list::iterator&) no longer u...1 < struct_compat

 

Let me know if need anything else.

 

,Ashley

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux