Re: OSD crashes on EC recovery

Brian Felton <bjfelton@xxxxxxxxx> · Wed, 10 Aug 2016 11:21:28 -0500

Roeland,

We're seeing the same problems in our cluster.  I can't offer you a solution that gets the OSD back, but I can tell you what I did to work around it.

We're running 5 0.94.6 clusters with 9 nodes / 648 HDD OSDs with a k=7, m=2 erasure coded .rgw.buckets pool.  During the backfilling after a recent disk replacement, we had four OSDs that got in a very similar state.

2016-08-09 07:40:12.475699 7f025b06b700 -1 osd/ECBackend.cc: In function 'void ECBackend::handle_recovery_push(PushOp&, RecoveryMessages*)' thread 7f025b06b700 time 2016-08-09 07:40:12.472819
osd/ECBackend.cc: 281: FAILED assert(op.attrset.count(string("_")))

 ceph version 0.94.6-2 (f870be457b16e4ff56ced74ed3a3c9a4c781f281)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xba997b]
 2: (ECBackend::handle_recovery_push(PushOp&, RecoveryMessages*)+0xd7f) [0xa239ff]
 3: (ECBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x1de) [0xa2600e]
 4: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x167) [0x8305e7]
 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3bd) [0x6a157d]
 6: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x338) [0x6a1aa8]
 7: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f) [0xb994cf]
 8: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xb9b5f0]
 9: (()+0x8184) [0x7f0284e35184]
 10: (clone()+0x6d) [0x7f028324c37d]

To allow the cluster to recover, we ended up reweighting the OSDs that got into this state to 0 (ceph osd crush reweight <osd-id> 0).  This will of course kick off a long round of backfilling, but it eventually recovers.  We've never found a solution that gets the OSD healthy again that doesn't involve nuking the underlying disk and starting over.  We've had 10 OSDs get in this state across 2 clusters in the last few months.  The failure/crash message is always the same.  If someone does know of a way to recover the OSD, that would be great.  

I hope this helps.

Brian Felton

On Wed, Aug 10, 2016 at 10:17 AM, Roeland Mertens <roeland.mertens@xxxxxxxxxxxxxxx> wrote:
Hi,

we run a Ceph 10.2.1 cluster across 35 nodes with a total of 595 OSDs, we have a mixture of normally replicated volumes and EC volumes using the following erasure-code-profile:

# ceph osd erasure-code-profile get rsk8m5

jerasure-per-chunk-alignment=false

k=8

m=5

plugin=jerasure

ruleset-failure-domain=host

ruleset-root=default

technique=reed_sol_van

w=8

Now we had a disk failure and on swap out we seem to have encountered a bug where during recovery OSDs crash when trying to fix certain pgs that may have been corrupted.

For example:

   -3> 2016-08-10 12:38:21.302938 7f893e2d7700  5 -- op tracker -- seq: 3434, time: 2016-08-10 12:38:21.302938, event: queued_for_pg, op: MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))

    -2> 2016-08-10 12:38:21.302981 7f89bef50700  1 -- 10.93.105.11:6831/2674119 --> 10.93.105.22:6802/357033 -- osd_map(47662..47663 src has 32224..47663) v3 -- ?+0 0x559c1057f3c0 con 0x559c0664a700

    -1> 2016-08-10 12:38:21.302996 7f89bef50700  5 -- op tracker -- seq: 3434, time: 2016-08-10 12:38:21.302996, event: reached_pg, op: MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))

     0> 2016-08-10 12:38:21.306193 7f89bef50700 -1 osd/ECBackend.cc: In function 'virtual void OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)' thread 7f89bef50700 time 2016-08-10 12:38:21.303012

osd/ECBackend.cc: 203: FAILED assert(res.errors.empty())

then the ceph-osd daemon goes splat. I've attached an extract of a logfile showing a bit more.

Anyone have any ideas? I'm stuck now with a pg that's stuck as down+remapped+peering. ceph pg query tells me that peering is blocked to the loss of an osd, though restarting it just results in another crash of the ceph-osd daemon. We tried to force a rebuild by using ceph-objectstore-tool to delete the pg segment on some of the OSDs that are crashing but that didn't help one iota.

Any help would be greatly appreciated,

regards,

Roeland

-- 

This email is sent on behalf of Genomics plc, a public limited company registered in England and Wales with registered number 8839972, VAT registered number 189 2635 65 and registered office at King Charles House, Park End Street, Oxford, OX1 1JD, United Kingdom.

The contents of this e-mail and any attachments are confidential to the intended recipient. If you are not the intended recipient please do not use or publish its contents, contact Genomics plc immediately at info@xxxxxxxxxxxxxxx <info@xxxxxxxxxxxxxxx> then delete. You may not copy, forward, use or disclose the contents of this email to anybody else if you are not the intended recipient. Emails are not secure and may contain viruses.

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com