Hi all, I have setup a 10 osd + 2 mds + 3 mon ceph cluster. it runs ok at beginning. However after one day, some of the osd crashed with following assert fail I am using the unstable trunk. ceph.conf is attached. -------------- osd 3 ----------------- osd/PG.h: In function 'void PG::IndexedLog::index(PG::Log::Entry&)': osd/PG.h:429: FAILED assert(caller_ops.count(e.reqid) == 0) 1: (OSD::_process_pg_info(unsigned int, int, PG::Info&, PG::Log&, PG::Missing&, std::map<int, MOSDPGInfo*, std::less<int>, std::allocator<std::pair<int const, MOSDPGInfo*> > >*, int&)+0xb06) [0x4cf426] 2: (OSD::handle_pg_log(MOSDPGLog*)+0xa9) [0x4cf999] 3: (OSD::_dispatch(Message*)+0x3ed) [0x4e7dfd] 4: (OSD::ms_dispatch(Message*)+0x39) [0x4e86c9] 5: (SimpleMessenger::dispatch_entry()+0x789) [0x46b5f9] 6: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x45849c] 7: (Thread::_entry_func(void*)+0xa) [0x46c0ca] 8: (()+0x6a3a) [0x7f69fd39ea3a] 9: (clone()+0x6d) [0x7f69fc5bc77d] -------------- osd 7 -------------------- osd/ReplicatedPG.cc: In function 'void ReplicatedPG::sub_op_pull(MOSDSubOp*)': osd/ReplicatedPG.cc:3021: FAILED assert(r == 0) 1: (OSD::dequeue_op(PG*)+0x344) [0x4e6fd4] 2: (ThreadPool::worker()+0x28f) [0x5b5a9f] 3: (ThreadPool::WorkThread::entry()+0xd) [0x4f0acd] 4: (Thread::_entry_func(void*)+0xa) [0x46c0ca] 5: (()+0x6a3a) [0x7efff4f12a3a] 6: (clone()+0x6d) [0x7efff413077d] Please let me if you need more information. I still keep the environment for collecting more data for debug. Thanks.
Attachment:
ceph.conf
Description: Binary data