Re: Random OSD failures - FAILED assert

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Most likely fixed in firefly.
-Sam

----- Original Message -----
From: "Kostis Fardelas" <dante1234@xxxxxxxxx>
To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Tuesday, March 17, 2015 12:30:43 PM
Subject:  Random OSD failures - FAILED assert

Hi,
we are running Ceph v.0.72.2 (emperor) from the ceph emperor repo. The
latest week we had 2 random OSD crashes (one during cluster recovery
and one while in healthy state) with the same symptom: osd process
crashes, logs the following trace on its log and gets down and out. We
are in the process of preparing our cluster upgrade to firefly, but we
would like to know if this is a known bug fixed in more recent
versions and more about troubleshooting the specific failure. On which
subsystems could we increase their debugging level to provide more
info?

2015-03-16 20:44:18.768488 7f516d4c9700 -1 osd/ReplicatedPG.cc: In
function 'void ReplicatedPG::sub_op_modify(OpRequestRef)' thread
7f516d4c9700 time 2015-03-16 20:44:18.764353
osd/ReplicatedPG.cc: 5570: FAILED assert(!pg_log.get_missing().is_missing(soid))

 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
 1: (ReplicatedPG::sub_op_modify(std::tr1::shared_ptr<OpRequest>)+0xae0)
[0x9182c0]
 2: (ReplicatedPG::do_sub_op(std::tr1::shared_ptr<OpRequest>)+0x117) [0x9184f7]
 3: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>,
ThreadPool::TPHandle&)+0x381) [0x8f12a1]
 4: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x316)
[0x6f7096]
 5: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>,
ThreadPool::TPHandle&)+0x198) [0x70e048]
 6: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>,
std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG>
>::_void_process(void*, ThreadPool::TPHandle&)+0xae) [0x7494ce]
 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0xa517fa]
 8: (ThreadPool::WorkThread::entry()+0x10) [0xa52a50]
 9: (()+0x6b50) [0x7f5199f52b50]
 10: (clone()+0x6d) [0x7f519871e70d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- begin dump of recent events ---

Regards,
Kostis
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux