Hi Sage,
Unfortunately I can't see what the thread gets stuck doing after it stops
doing work (at an apparently normal point). Is there any chance you can
attach to it with gdb as soon as the log slows down and the initial
timeout messages appear? Or check the core file and see what thread
7fd8c1ff3700 is up to?
Here's the info from the core file:
(gdb) thread 45
[Switching to thread 45 (Thread 0x7fd8c1ff3700 (LWP 995))]
#0 0x000000360de0acb4 in pthread_rwlock_rdlock () from
/lib64/libpthread.so.0
(gdb) bt
#0 0x000000360de0acb4 in pthread_rwlock_rdlock () from
/lib64/libpthread.so.0
#1 0x00000000006e34a9 in RWLock::get_read (this=0x1aa54c8) at
common/RWLock.h:51
#2 0x00000000006a0857 in OSD::queue_want_up_thru
(this=this@entry=0x1aa44f0, want=want@entry=460) at osd/OSD.cc:2585
Python Exception <type 'exceptions.IndexError'> list index out of range:
#3 0x00000000006ce175 in OSD::process_peering_events (this=0x1aa44f0,
pgs=std::list) at osd/OSD.cc:6193
Python Exception <type 'exceptions.IndexError'> list index out of range:
#4 0x0000000000709617 in OSD::PeeringWQ::_process (this=<optimized
out>, pgs=std::list) at osd/OSD.h:718
#5 0x00000000008cbefc in ThreadPool::worker (this=0x1aa4938,
wt=0x3c539a0) at common/WorkQueue.cc:113
#6 0x00000000008cce70 in ThreadPool::WorkThread::entry (this=<optimized
out>) at common/WorkQueue.h:288
#7 0x000000360de07d14 in start_thread () from /lib64/libpthread.so.0
#8 0x000000360d6f167d in clone () from /lib64/libc.so.6
Does this help?
--
Jens Kristian Søgaard, Mermaid Consulting ApS,
jens@xxxxxxxxxxxxxxxxxxxx,
http://www.mermaidconsulting.com/
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html