On Fri, 4 Apr 2014, Amit Tiwary wrote: > Sage Weil <sage <at> inktank.com> writes: > > > > Hi Amit, > > > > > > common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread > 7f615f275700 > > > time 2014-04-04 09:03:22.128731 > > > common/Mutex.cc: 93: FAILED assert(r == 0) > > > ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7) > > > 1: (Mutex::Lock(bool)+0x1d3) [0x7f61576ac763] > > > 2: (librados::IoCtxImpl::operate_read(object_t const&, > ObjectOperation*, > > > ceph::buffer::list*)+0x17b) [0x7f615765069b] > > > > This Mutex assertion usually triggers in use-after-free cases where the > > pthread mutex id is invalid (because it has been deallocated). My guess > > is that your IoCtx has been freed, or the shutdown() method has been > > called on the cluster handle... is that possible? (Obviously not with the > > code fragment above, but I'm guess that isn't a straight copy+paste from > > your code?) > > > > sage > > > Thanks Sage for your input. I would look into the code to figure out if they > are any calls to IoCtx::close() or Rados::shutdown(). However, on closer > inspection I find lot of pthread_create() failing (refer log below) with > error 11 (EAGAIN) indicating "insufficient resources to create another > thread, or a system-imposed limit on the number of threads was encountered". > Could Mutex assertion failure be a side effect of this? > ------------------------------- > Thread::try_create(): pthread_create failed with error 11 > common/Thread.cc: In function 'void Thread::create(size_t)' thread > 7f615f275700 time 2014-04-04 08:57:03.511877 > common/Thread.cc: 110: FAILED assert(ret == 0) > ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7) > 1: (()+0x36004f) [0x7f615783804f] > 2: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*, > Message*)+0x17b) [0x7f61577c9b9b] > 3: (SimpleMessenger::get_connection(entity_inst_t const&)+0x244) > [0x7f61577ce824] > 4: (Objecter::get_session(int)+0x1d1) [0x7f615765b391] > 5: (Objecter::recalc_op_target(Objecter::Op*)+0x336) [0x7f615765c0d6] > 6: (Objecter::_op_submit(Objecter::Op*)+0x43) [0x7f6157665993] > 7: (librados::IoCtxImpl::operate_read(object_t const&, ObjectOperation*, > ceph::buffer::list*)+0x2dc) [0x7f61576507fc] > 8: (librados::IoCtxImpl::stat(object_t const&, unsigned long*, > long*)+0x185) [0x7f6157653b05] > 9: (librados::IoCtx::stat(std::string const&, unsigned long*, long*)+0x58) > [0x7f6157628498] Hmm, perhaps it's related. It sounds like you need to increase the ulimit on number of open files (ulimit -n) as we create many threads and sockets to communicate with the cluster. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html