While I cannot reproduce what you are seeing, I can see how it could theoretically be possible for this to deadlock on a thread shutdown if the process was being shutdown before the service thread had a chance to actually start executing. I've opened a tracker ticket for the issue [1]. [1] http://tracker.ceph.com/issues/20776 On Tue, Jul 25, 2017 at 6:18 PM, Kjetil Jørgensen <kjetil@xxxxxxxxxxxx> wrote: > Hi, > > I'm not sure yet whether or not this is made worse by config, however - if I > do something along the lines of: >> >> seq 100 | xargs -P100 -n1 bash -c 'exec rbd.original showmapped' > > > I'll end up with at least one of the invocations deadlocked like below. > Doing the same on our v10.2.7 clusters seems to work fine. > > The stacktraces according to GDB looks something like this for all the ones > I've looked at at least: >> >> warning: the debug information found in "/usr/bin/rbd" does not match >> "/usr/bin/rbd.original" (CRC mismatch). >> # Yes - we've diverted rbd to rbd.original with a shell-wrapper around it > > >> [New LWP 285438] >> [New LWP 285439] >> [Thread debugging using libthread_db enabled] >> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". >> 0x00007fbbea58798d in pthread_join (threadid=140444952844032, >> thread_return=thread_return@entry=0x0) at pthread_join.c:90 >> 90 pthread_join.c: No such file or directory. >> Thread 3 (Thread 0x7fbbe3865700 (LWP 285439)): >> #0 pthread_cond_wait@@GLIBC_2.3.2 () at >> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 >> #1 0x000055a852fcf896 in Cond::Wait (mutex=..., this=0x55a85cdeb258) at >> ./common/Cond.h:56 >> #2 CephContextServiceThread::entry (this=0x55a85cdeb1c0) at >> common/ceph_context.cc:101 >> #3 0x00007fbbea5866ba in start_thread (arg=0x7fbbe3865700) at >> pthread_create.c:333 >> #4 0x00007fbbe80743dd in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >> Thread 2 (Thread 0x7fbbe4804700 (LWP 285438)): >> #0 pthread_cond_wait@@GLIBC_2.3.2 () at >> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 >> #1 0x000055a852fb297b in ceph::log::Log::entry (this=0x55a85cd98830) at >> log/Log.cc:457 >> #2 0x00007fbbea5866ba in start_thread (arg=0x7fbbe4804700) at >> pthread_create.c:333 >> #3 0x00007fbbe80743dd in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >> Thread 1 (Thread 0x7fbbfda1e100 (LWP 285436)): >> #0 0x00007fbbea58798d in pthread_join (threadid=140444952844032, >> thread_return=thread_return@entry=0x0) at pthread_join.c:90 >> #1 0x000055a852fb6270 in Thread::join (this=this@entry=0x55a85cdeb1c0, >> prval=prval@entry=0x0) at common/Thread.cc:171 >> #2 0x000055a852fca060 in CephContext::join_service_thread >> (this=this@entry=0x55a85cd95780) at common/ceph_context.cc:637 >> #3 0x000055a852fcc2c7 in CephContext::~CephContext (this=0x55a85cd95780, >> __in_chrg=<optimized out>) at common/ceph_context.cc:507 >> #4 0x000055a852fcc9bc in CephContext::put (this=0x55a85cd95780) at >> common/ceph_context.cc:578 >> #5 0x000055a852eac2b1 in >> boost::intrusive_ptr<CephContext>::~intrusive_ptr (this=0x7ffef7ef5060, >> __in_chrg=<optimized out>) at >> /usr/include/boost/smart_ptr/intrusive_ptr.hpp:97 >> #6 main (argc=<optimized out>, argv=<optimized out>) at >> tools/rbd/rbd.cc:17 > > > Cheers, > -- > Kjetil Joergensen <kjetil@xxxxxxxxxxxx> > Staff Curmudgeon, Medallia Inc > Phone: +1 (650) 739-6580 > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com