Hi, This is one we've seen before, issue #326 http://tracker.newdream.net/issues/326 Was that the first (and only?) osd to fail? What kind of workload were you subjecting the cluster to? Just the file system? RBD? Anything unusual? Also, can you confirm what version of the code you were running? The osd log at /var/log/ceph/osd.*.log should have a version number and sha1 id, something like ceph version 0.22~rc (3cd9d853cd58c79dc12427be8488e57970abda04) Thanks! sage On Mon, 6 Sep 2010, Leander Yu wrote: > Hi all, > I have setup a 10 osd + 2 mds + 3 mon ceph cluster. it runs ok at > beginning. However after one day, some of the osd crashed with > following assert fail > I am using the unstable trunk. ceph.conf is attached. > > -------------- osd 3 ----------------- > osd/PG.h: In function 'void PG::IndexedLog::index(PG::Log::Entry&)': > osd/PG.h:429: FAILED assert(caller_ops.count(e.reqid) == 0) > 1: (OSD::_process_pg_info(unsigned int, int, PG::Info&, PG::Log&, > PG::Missing&, std::map<int, MOSDPGInfo*, std::less<int>, > std::allocator<std::pair<int const, MOSDPGInfo*> > >*, int&)+0xb06) > [0x4cf426] > 2: (OSD::handle_pg_log(MOSDPGLog*)+0xa9) [0x4cf999] > 3: (OSD::_dispatch(Message*)+0x3ed) [0x4e7dfd] > 4: (OSD::ms_dispatch(Message*)+0x39) [0x4e86c9] > 5: (SimpleMessenger::dispatch_entry()+0x789) [0x46b5f9] > 6: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x45849c] > 7: (Thread::_entry_func(void*)+0xa) [0x46c0ca] > 8: (()+0x6a3a) [0x7f69fd39ea3a] > 9: (clone()+0x6d) [0x7f69fc5bc77d] > > -------------- osd 7 -------------------- > osd/ReplicatedPG.cc: In function 'void ReplicatedPG::sub_op_pull(MOSDSubOp*)': > osd/ReplicatedPG.cc:3021: FAILED assert(r == 0) > 1: (OSD::dequeue_op(PG*)+0x344) [0x4e6fd4] > 2: (ThreadPool::worker()+0x28f) [0x5b5a9f] > 3: (ThreadPool::WorkThread::entry()+0xd) [0x4f0acd] > 4: (Thread::_entry_func(void*)+0xa) [0x46c0ca] > 5: (()+0x6a3a) [0x7efff4f12a3a] > 6: (clone()+0x6d) [0x7efff413077d] > > Please let me if you need more information. I still keep the > environment for collecting more data for debug. > > Thanks. >