OSD crashes on EC recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

we run a Ceph 10.2.1 cluster across 35 nodes with a total of 595 OSDs, we have a mixture of normally replicated volumes and EC volumes using the following erasure-code-profile:

# ceph osd erasure-code-profile get rsk8m5
jerasure-per-chunk-alignment=false
k=8
m=5
plugin=jerasure
ruleset-failure-domain=host
ruleset-root=default
technique=reed_sol_van
w=8

Now we had a disk failure and on swap out we seem to have encountered a bug where during recovery OSDs crash when trying to fix certain pgs that may have been corrupted.

For example:
-3> 2016-08-10 12:38:21.302938 7f893e2d7700 5 -- op tracker -- seq: 3434, time: 2016-08-10 12:38:21.302938, event: queued_for_pg, op: MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0)) -2> 2016-08-10 12:38:21.302981 7f89bef50700 1 -- 10.93.105.11:6831/2674119 --> 10.93.105.22:6802/357033 -- osd_map(47662..47663 src has 32224..47663) v3 -- ?+0 0x559c1057f3c0 con 0x559c0664a700 -1> 2016-08-10 12:38:21.302996 7f89bef50700 5 -- op tracker -- seq: 3434, time: 2016-08-10 12:38:21.302996, event: reached_pg, op: MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0)) 0> 2016-08-10 12:38:21.306193 7f89bef50700 -1 osd/ECBackend.cc: In function 'virtual void OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)' thread 7f89bef50700 time 2016-08-10 12:38:21.303012
osd/ECBackend.cc: 203: FAILED assert(res.errors.empty())

then the ceph-osd daemon goes splat. I've attached an extract of a logfile showing a bit more.

Anyone have any ideas? I'm stuck now with a pg that's stuck as down+remapped+peering. ceph pg query tells me that peering is blocked to the loss of an osd, though restarting it just results in another crash of the ceph-osd daemon. We tried to force a rebuild by using ceph-objectstore-tool to delete the pg segment on some of the OSDs that are crashing but that didn't help one iota.

Any help would be greatly appreciated,

regards,

Roeland

--
This email is sent on behalf of Genomics plc, a public limited company registered in England and Wales with registered number 8839972, VAT registered number 189 2635 65 and registered office at King Charles House, Park End Street, Oxford, OX1 1JD, United Kingdom. The contents of this e-mail and any attachments are confidential to the intended recipient. If you are not the intended recipient please do not use or publish its contents, contact Genomics plc immediately at info@xxxxxxxxxxxxxxx <info@xxxxxxxxxxxxxxx> then delete. You may not copy, forward, use or disclose the contents of this email to anybody else if you are not the intended recipient. Emails are not secure and may contain viruses.
    -4> 2016-08-10 12:38:21.302910 7f893e2d7700  1 -- 10.93.105.11:6831/2674119 <== osd.290 10.93.105.22:6802/357033 42 ==== MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0)) v1 ==== 170+0+0 (1521384358 0 0) 0x559bf
b611400 con 0x559c0664a700
    -3> 2016-08-10 12:38:21.302938 7f893e2d7700  5 -- op tracker -- seq: 3434, time: 2016-08-10 12:38:21.302938, event: queued_for_pg, op: MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))
    -2> 2016-08-10 12:38:21.302981 7f89bef50700  1 -- 10.93.105.11:6831/2674119 --> 10.93.105.22:6802/357033 -- osd_map(47662..47663 src has 32224..47663) v3 -- ?+0 0x559c1057f3c0 con 0x559c0664a700
    -1> 2016-08-10 12:38:21.302996 7f89bef50700  5 -- op tracker -- seq: 3434, time: 2016-08-10 12:38:21.302996, event: reached_pg, op: MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))
     0> 2016-08-10 12:38:21.306193 7f89bef50700 -1 osd/ECBackend.cc: In function 'virtual void OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)' thread 7f89bef50700 time 2016-08-10 12:38:21.303012
osd/ECBackend.cc: 203: FAILED assert(res.errors.empty())

 ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x559be1135e2b]
 2: (OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x192) [0x559be0cf6122]
 3: (GenContext<std::pair<RecoveryMessages*, ECBackend::read_result_t&>&>::complete(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x9) [0x559be0ce3b89]
 4: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x63) [0x559be0cda003]
 5: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*)+0xf68) [0x559be0cdafd8]
 6: (ECBackend::handle_message(std::shared_ptr<OpRequest>)+0x186) [0x559be0ce2236]
 7: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xed) [0x559be0c1c30d]
 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f5) [0x559be0adb285]
 9: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x5d) [0x559be0adb4ad]
 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x869) [0x559be0adfec9]
 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x885) [0x559be1126195]
 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x559be11280d0]
 13: (()+0x8184) [0x7f89e8b7b184]
 14: (clone()+0x6d) [0x7f89e6ca937d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 kinetic
   1/ 5 fuse
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.34.log
--- end dump of recent events ---
2016-08-10 12:38:21.314493 7f89d3c25700  1 leveldb: Generated table #22502: 17869 keys, 2126035 bytes
2016-08-10 12:38:21.357565 7f89bef50700 -1 *** Caught signal (Aborted) **
 in thread 7f89bef50700 thread_name:tp_osd_tp

 ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
 1: (()+0x8eac12) [0x559be103ec12]
 2: (()+0x10330) [0x7f89e8b83330]
 3: (gsignal()+0x37) [0x7f89e6be5c37]
 4: (abort()+0x148) [0x7f89e6be9028]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x559be1136005]
 6: (OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x192) [0x559be0cf6122]
 7: (GenContext<std::pair<RecoveryMessages*, ECBackend::read_result_t&>&>::complete(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x9) [0x559be0ce3b89]
 8: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x63) [0x559be0cda003]
 9: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*)+0xf68) [0x559be0cdafd8]
 10: (ECBackend::handle_message(std::shared_ptr<OpRequest>)+0x186) [0x559be0ce2236]
 11: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xed) [0x559be0c1c30d]
 12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f5) [0x559be0adb285]
 13: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x5d) [0x559be0adb4ad]
 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x869) [0x559be0adfec9]
 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x885) [0x559be1126195]
 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x559be11280d0]
 17: (()+0x8184) [0x7f89e8b7b184]
 18: (clone()+0x6d) [0x7f89e6ca937d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux