Re: v12.2.2 bluestore crash

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 6 Feb 2018 12:59:42 +0000 (UTC)

On Tue, 6 Feb 2018, 陶冬冬 wrote:
> Thanks Varada, i didn’t find any useful message

Yeah, it looks like it's probably EIO but it's surprising you don't see 
anything in dmesg output.

There were several patches to master that improve the error reporting 
and propagation so that EIO reaches the OSD (which will allow scrub to 
do a repair).  Adding them to the queue for 12.2.3!

I would let your cluster heal around this OSD (if it hasn't already).  
Then either wipe and reprosivion it, or wait for 12.2.3 and we 
can see if it is handled more gracefully.

Thanks!
sage

> 
> > 在 2018年2月6日，下午1:43，Varada Kari <varada.kari@xxxxxxxxx> 写道：
> > 
> > Seems you are not able to read from disk. Could you kern.log and
> > syslog for any disk errors?
> > 
> > Varada
> > 
> > On Tue, Feb 6, 2018 at 8:45 AM, 陶冬冬 <tdd21151186@xxxxxxxxx> wrote:
> >> Dear Cephers,
> >> 
> >> 
> >> crash stack:
> >> c/os/bluestore/BlueStore.cc: 6661: FAILED assert(r == 0)
> >> ```
> >> ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)
> >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55cc7bf20550]
> >> 2: (BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x1e50) [0x55cc7bdeb360]
> >> 3: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x61a) [0x55cc7bdec50a]
> >> 4: (ReplicatedBackend::be_deep_scrub(hobject_t const&, unsigned int, ScrubMap::object&, ThreadPool::TPHandle&)+0x247) [0x55cc7bc5e697]
> >> 5: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t, std::allocator<hobject_t> > const&, bool, unsigned int, ThreadPool::TPHandle&)+0x290) [0x55cc7bb99720]
> >> 6: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x215) [0x55cc7ba47825]
> >> 7: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x5e6) [0x55cc7ba48116]
> >> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x720) [0x55cc7bb05110]
> >> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) [0x55cc7b98f899]
> >> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x55cc7bc07897]
> >> 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x55cc7b9bd43e]
> >> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x55cc7bf26069]
> >> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55cc7bf28000]
> >> 14: (()+0x7e25) [0x7f6de7a08e25]
> >> 15: (clone()+0x6d) [0x7f6de6afc34d]
> >> 
> >> could anyone please help to take a look, why would this happen ?
> >> currently, this osd keep crashes because of this assertion failure.
> >> 
> >> Regards,
> >> Dongdong--
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>