Hi Stefan, On Thu, 23 Sep 2010, Stefan Majer wrote: > Hi, > > we saw on one of our OSDs (16 in total) the followin assert. > > osd/OSD.cc: In function 'void OSD::start_recovery_op(PG*, const sobject_t&)': > osd/OSD.cc:4250: FAILED assert(recovery_oids.count(soid) == 0) > 1: (PG::start_recovery_op(sobject_t const&)+0x127) [0x525627] > 2: (ReplicatedPG::recover_object_replicas(sobject_t const&)+0x191) [0x482881] > 3: (ReplicatedPG::recover_replicas(int)+0x2db) [0x482ddb] > 4: (ReplicatedPG::start_recovery_ops(int)+0x92) [0x4832f2] > 5: (OSD::do_recovery(PG*)+0x1e3) [0x4b8cf3] > 6: (ThreadPool::worker()+0x291) [0x5ac5d1] > 7: (ThreadPool::WorkThread::entry()+0xd) [0x4ec93d] > 8: (Thread::_entry_func(void*)+0x7) [0x46eee7] > 9: (()+0x77e1) [0x7f952e6917e1] > 10: (clone()+0x6d) [0x7f952d8b551d] > > Any hints to further nail down this problem. Without logs, it's hard to tell what caused it. Has it only happened the one time? Did the OSD behave when it was restarted? Generally speaking, 'debug osd = 20' and 'debug ms = 1' would have the context needed to identify the problem, but it's a lot of a logging and will slow things down some. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html