Re: OSD Repeated Failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 11, 2017 at 2:51 PM, Ashley Merrick <ashley@xxxxxxxxxxxxxx> wrote:
> Hello,
>
>
>
> I have a particular OSD (53), which at random will crash with the OSD
> process stopping.
>
>
>
> OS: Debian 8.x
>
> CEPH : ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>
>
>
> From the logs at the time of the OSD being marked as crashed I can only see
> the following:
>
>
>
>     -4> 2017-02-10 23:40:16.820894 7fadbd049700  1 -- 172.16.3.7:6825/16969
> <== osd.26 172.16.2.104:0/5812 1 ==== osd_ping(ping e29842 stamp 2017-02$
>
>     -3> 2017-02-10 23:40:16.820918 7fadbd049700  1 -- 172.16.3.7:6825/16969
> --> 172.16.2.104:0/5812 -- osd_ping(ping_reply e29842 stamp 2017-02-10 2$
>
>     -2> 2017-02-10 23:40:16.822436 7faddb149700  1 --
> 172.16.2.107:6820/16969 <== client.8222771 172.16.2.2:0/1125091221 86 ====
> osd_op(client.82227$
>
>     -1> 2017-02-10 23:40:16.822453 7faddb149700  5 -- op tracker -- seq:
> 670, time: 2017-02-10 23:40:16.822453, event: queued_for_pg, op: osd_op(cli$
>
>      0> 2017-02-10 23:40:16.832241 7fadd0631700 -1 *** Caught signal
> (Aborted) **
>
> in thread 7fadd0631700 thread_name:tp_osd_tp
>
>
>
> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>
> 1: (()+0x951cc7) [0x5556d8c4bcc7]
>
> 2: (()+0xf890) [0x7fadf5f8e890]
>
> 3: (gsignal()+0x37) [0x7fadf3fd5067]
>
> 4: (abort()+0x148) [0x7fadf3fd6448]
>
> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x256) [0x5556d8d51296]
>
> 6: (FileStore::read(coll_t const&, ghobject_t const&, unsigned long,
> unsigned long, ceph::buffer::list&, unsigned int, bool)+0xd7c)
> [0x5556d89e68ec]
>
> 7: (ReplicatedBackend::objects_read_sync(hobject_t const&, unsigned long,
> unsigned long, unsigned int, ceph::buffer::list*)+0xcd) [0x5556d885ce7d]
>
> 8: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp,
> std::allocator<OSDOp> >&)+0x6355) [0x5556d87f6515]
>
> 9: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x61)
> [0x5556d8802101]
>
> 10: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x936)
> [0x5556d880a566]
>
> 11: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x37c3)
> [0x5556d880f3d3]
>
> 12: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&,
> ThreadPool::TPHandle&)+0x727) [0x5556d87c6ae7]
>
> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>,
> ThreadPool::TPHandle&)+0x420) [0x5556d866b650]
>
> 14: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6a)
> [0x5556d866b8aa]
>
> 15: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x87a) [0x5556d8687f7a]
>
> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8b6)
> [0x5556d8d40c56]
>
> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5556d8d42c10]
>
> 18: (()+0x8064) [0x7fadf5f87064]
>
> 19: (clone()+0x6d) [0x7fadf408862d]
>
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
>
>
>
>
> Does this relate to anything or do I need to dig deeper to find the issue?

It's likely a filesystem or hardware problem as it is failing an
assert in FileStore::read.

Could you thoroughly check the filesystem and the underlying hardware.

You can possibly get more information about the specifics of the issue
by caturing a log with debugging turned right up (20).

>
>
>
> ,Ashley
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux