OSD crashes on EC recovery

Roeland Mertens <roeland.mertens@xxxxxxxxxxxxxxx> · Wed, 10 Aug 2016 16:17:10 +0100

Hi,

we run a Ceph 10.2.1 cluster across 35 nodes with a total of 595 OSDs, 
we have a mixture of normally replicated volumes and EC volumes using 
the following erasure-code-profile:

# ceph osd erasure-code-profile get rsk8m5
jerasure-per-chunk-alignment=false
k=8
m=5
plugin=jerasure
ruleset-failure-domain=host
ruleset-root=default
technique=reed_sol_van
w=8

Now we had a disk failure and on swap out we seem to have encountered a 
bug where during recovery OSDs crash when trying to fix certain pgs that 
may have been corrupted.

For example:
   -3> 2016-08-10 12:38:21.302938 7f893e2d7700  5 -- op tracker -- seq: 
3434, time: 2016-08-10 12:38:21.302938, event: queued_for_pg, op: 
MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))
    -2> 2016-08-10 12:38:21.302981 7f89bef50700  1 -- 
10.93.105.11:6831/2674119 --> 10.93.105.22:6802/357033 -- 
osd_map(47662..47663 src has 32224..47663) v3 -- ?+0 0x559c1057f3c0 con 
0x559c0664a700
    -1> 2016-08-10 12:38:21.302996 7f89bef50700  5 -- op tracker -- 
seq: 3434, time: 2016-08-10 12:38:21.302996, event: reached_pg, op: 
MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))
     0> 2016-08-10 12:38:21.306193 7f89bef50700 -1 osd/ECBackend.cc: In 
function 'virtual void 
OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, 
ECBackend::read_result_t&>&)' thread 7f89bef50700 time 2016-08-10 
12:38:21.303012
osd/ECBackend.cc: 203: FAILED assert(res.errors.empty())

then the ceph-osd daemon goes splat. I've attached an extract of a 
logfile showing a bit more.

Anyone have any ideas? I'm stuck now with a pg that's stuck as 
down+remapped+peering. ceph pg query tells me that peering is blocked to 
the loss of an osd, though restarting it just results in another crash 
of the ceph-osd daemon. We tried to force a rebuild by using 
ceph-objectstore-tool to delete the pg segment on some of the OSDs that 
are crashing but that didn't help one iota.

Any help would be greatly appreciated,

regards,

Roeland

--
This email is sent on behalf of Genomics plc, a public limited company 
registered in England and Wales with registered number 8839972, VAT 
registered number 189 2635 65 and registered office at King Charles House, 
Park End Street, Oxford, OX1 1JD, United Kingdom.
The contents of this e-mail and any attachments are confidential to the 
intended recipient. If you are not the intended recipient please do not use 
or publish its contents, contact Genomics plc immediately at 
info@xxxxxxxxxxxxxxx <info@xxxxxxxxxxxxxxx> then delete. You may not copy, 
forward, use or disclose the contents of this email to anybody else if you 
are not the intended recipient. Emails are not secure and may contain 
viruses.
    -4> 2016-08-10 12:38:21.302910 7f893e2d7700  1 -- 10.93.105.11:6831/2674119 <== osd.290 10.93.105.22:6802/357033 42 ==== MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0)) v1 ==== 170+0+0 (1521384358 0 0) 0x559bf
b611400 con 0x559c0664a700
    -3> 2016-08-10 12:38:21.302938 7f893e2d7700  5 -- op tracker -- seq: 3434, time: 2016-08-10 12:38:21.302938, event: queued_for_pg, op: MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))
    -2> 2016-08-10 12:38:21.302981 7f89bef50700  1 -- 10.93.105.11:6831/2674119 --> 10.93.105.22:6802/357033 -- osd_map(47662..47663 src has 32224..47663) v3 -- ?+0 0x559c1057f3c0 con 0x559c0664a700
    -1> 2016-08-10 12:38:21.302996 7f89bef50700  5 -- op tracker -- seq: 3434, time: 2016-08-10 12:38:21.302996, event: reached_pg, op: MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))
     0> 2016-08-10 12:38:21.306193 7f89bef50700 -1 osd/ECBackend.cc: In function 'virtual void OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)' thread 7f89bef50700 time 2016-08-10 12:38:21.303012
osd/ECBackend.cc: 203: FAILED assert(res.errors.empty())

 ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x559be1135e2b]
 2: (OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x192) [0x559be0cf6122]
 3: (GenContext<std::pair<RecoveryMessages*, ECBackend::read_result_t&>&>::complete(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x9) [0x559be0ce3b89]
 4: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x63) [0x559be0cda003]
 5: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*)+0xf68) [0x559be0cdafd8]
 6: (ECBackend::handle_message(std::shared_ptr<OpRequest>)+0x186) [0x559be0ce2236]
 7: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xed) [0x559be0c1c30d]
 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f5) [0x559be0adb285]
 9: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x5d) [0x559be0adb4ad]
 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x869) [0x559be0adfec9]
 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x885) [0x559be1126195]
 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x559be11280d0]
 13: (()+0x8184) [0x7f89e8b7b184]
 14: (clone()+0x6d) [0x7f89e6ca937d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 kinetic
   1/ 5 fuse
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.34.log
--- end dump of recent events ---
2016-08-10 12:38:21.314493 7f89d3c25700  1 leveldb: Generated table #22502: 17869 keys, 2126035 bytes
2016-08-10 12:38:21.357565 7f89bef50700 -1 *** Caught signal (Aborted) **
 in thread 7f89bef50700 thread_name:tp_osd_tp

 ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
 1: (()+0x8eac12) [0x559be103ec12]
 2: (()+0x10330) [0x7f89e8b83330]
 3: (gsignal()+0x37) [0x7f89e6be5c37]
 4: (abort()+0x148) [0x7f89e6be9028]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x559be1136005]
 6: (OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x192) [0x559be0cf6122]
 7: (GenContext<std::pair<RecoveryMessages*, ECBackend::read_result_t&>&>::complete(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x9) [0x559be0ce3b89]
 8: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x63) [0x559be0cda003]
 9: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*)+0xf68) [0x559be0cdafd8]
 10: (ECBackend::handle_message(std::shared_ptr<OpRequest>)+0x186) [0x559be0ce2236]
 11: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xed) [0x559be0c1c30d]
 12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f5) [0x559be0adb285]
 13: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x5d) [0x559be0adb4ad]
 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x869) [0x559be0adfec9]
 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x885) [0x559be1126195]
 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x559be11280d0]
 17: (()+0x8184) [0x7f89e8b7b184]
 18: (clone()+0x6d) [0x7f89e6ca937d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com