Re: Hit suicide timeout after adding new osd

Jens Kristian Søgaard <jens@xxxxxxxxxxxxxxxxxxxx> · Fri, 18 Jan 2013 12:24:30 +0100

Hi Sage,

Unfortunately I can't see what the thread gets stuck doing after it stops 
doing work (at an apparently normal point).  Is there any chance you can 
attach to it with gdb as soon as the log slows down and the initial 
timeout messages appear?  Or check the core file and see what thread 
7fd8c1ff3700 is up to?

Here's the info from the core file:

(gdb) thread 45
[Switching to thread 45 (Thread 0x7fd8c1ff3700 (LWP 995))]
#0  0x000000360de0acb4 in pthread_rwlock_rdlock () from 
/lib64/libpthread.so.0

(gdb) bt
#0  0x000000360de0acb4 in pthread_rwlock_rdlock () from 
/lib64/libpthread.so.0
#1  0x00000000006e34a9 in RWLock::get_read (this=0x1aa54c8) at 
common/RWLock.h:51
#2  0x00000000006a0857 in OSD::queue_want_up_thru 
(this=this@entry=0x1aa44f0, want=want@entry=460) at osd/OSD.cc:2585
Python Exception <type 'exceptions.IndexError'> list index out of range:
#3  0x00000000006ce175 in OSD::process_peering_events (this=0x1aa44f0, 
pgs=std::list) at osd/OSD.cc:6193
Python Exception <type 'exceptions.IndexError'> list index out of range:
#4  0x0000000000709617 in OSD::PeeringWQ::_process (this=<optimized 
out>, pgs=std::list) at osd/OSD.h:718
#5  0x00000000008cbefc in ThreadPool::worker (this=0x1aa4938, 
wt=0x3c539a0) at common/WorkQueue.cc:113
#6  0x00000000008cce70 in ThreadPool::WorkThread::entry (this=<optimized 
out>) at common/WorkQueue.h:288
#7  0x000000360de07d14 in start_thread () from /lib64/libpthread.so.0
#8  0x000000360d6f167d in clone () from /lib64/libc.so.6

Does this help?

--
Jens Kristian Søgaard, Mermaid Consulting ApS,
jens@xxxxxxxxxxxxxxxxxxxx,
http://www.mermaidconsulting.com/
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html