hi, you can create an issue on website: http://tracker.ceph.com/projects/ceph/issues we have notice some osd_tp timeout suicide situation, maybe have the same reason with your post. 2016-09-27 9:47 GMT+08:00 Jeegn Chen <jeegnchen@xxxxxxxxx>: > Hi all, > > I noticed a deadlock phenomenon in Hammer OSD (Strictly speaking, > starvation related to locks may be more precise since there is no lock > circle). > Although I found the threads locking each other, I have not figured > out why they came to the state. Any hint will be appreciate. > > As you see in the following inspection through gdb > * thread 11 has taken osd_lock but is waiting for heartbeat_lock > * thread 37 has taken heartbeat_lock but is waiting for a PipeConnection's lock > * thread 286 has taken the PipeConnection's lock needed by thread 37, > but it never returns from Pipe::tcp_read_nonblocking() to > Pipe::read_message(); As a result, the PipeConnection's lock is never > released by Pipe::read_message() > * Since osd_lock and heartbeat_lock are reserved, most of the OSD's > lock cannot move ahead and looks deadlock > > More detailed check on thread 286 shows that,EAGAIN error is always > received when recv() is invoked in Pipe::do_recv(). Thus, > Pipe::do_recv() logic retries forever, which is why > Pipe::tcp_read_nonblocking() cannot finish. > > I examined the Pipe object with gdb and know it is a connection from > 192.168.30.41:63805 to 192.168.30.105:6805. So I used "lsof" and > "netstat" to check both servers (See the result following the gdb > content). It seems that 192.168.30.41:63805->192.168.30.105:6805 is > still ESTABILISHED on 192.168.30.41 but the TCP connection does not > exist in 192.168.30.105's perspective. > My understanding is that even if a connection is half-closed on one > side, recv() on the other side should return 0 and give no error. But > why does recv() give EAGAIN constantly? > > Any idea on how this happen or any explanation? Some bug in kernel TCP > stack (2.6.32)? > > > (gdb) thread 11 > [Switching to thread 11 (Thread 0x7fcebcd2f700 (LWP 29061))]#0 > 0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x00000032a26095d8 in _L_lock_854 () from /lib64/libpthread.so.0 > #2 0x00000032a26094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x0000000000c0ba3b in Mutex::Lock (this=0x5bb0ce8, > no_lockdep=Unhandled dwarf expression opcode 0xf3) at > common/Mutex.cc:91 > #4 0x0000000000795c29 in Locker (this=0x5bb0000) at common/Mutex.h:115 > #5 OSD::maybe_update_heartbeat_peers (this=0x5bb0000) at osd/OSD.cc:3432 > #6 0x00000000007bad13 in OSD::tick (this=0x5bb0000) at osd/OSD.cc:3863 > #7 0x00000000007cb299 in Context::complete (this=0x5988a60, > r=Unhandled dwarf expression opcode 0xf3) at include/Context.h:65 > #8 0x0000000000c21bbf in SafeTimer::timer_thread (this=0x5bb0070) at > common/Timer.cc:105 > #9 0x0000000000c2317d in SafeTimerThread::entry (this=Unhandled dwarf > expression opcode 0xf3) at common/Timer.cc:38 > #10 0x00000032a2607aa1 in start_thread () from /lib64/libpthread.so.0 > #11 0x00000032a22e893d in clone () from /lib64/libc.so.6 > (gdb) frame 8 > #8 0x0000000000c21bbf in SafeTimer::timer_thread (this=0x5bb0070) at > common/Timer.cc:105 > 105 callback->complete(0); > (gdb) p &lock > $1 = (Mutex *) 0x5bb0018 > (gdb) frame 5 > #5 OSD::maybe_update_heartbeat_peers (this=0x5bb0000) at osd/OSD.cc:3432 > 3432 Mutex::Locker l(heartbeat_lock); > (gdb) p &heartbeat_lock > $2 = (Mutex *) 0x5bb0ce8 > > (gdb) thread 37 > [Switching to thread 37 (Thread 0x7fceaa4e4700 (LWP 29763))]#0 > 0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x00000032a26095d8 in _L_lock_854 () from /lib64/libpthread.so.0 > #2 0x00000032a26094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x0000000000c0ba3b in Mutex::Lock (this=0x49c38b18, > no_lockdep=Unhandled dwarf expression opcode 0xf3 > ) at common/Mutex.cc:91 > #4 0x0000000000ddeecb in Locker (this=0x49c38b00, p=0x7fceaa4e2c60) > at common/Mutex.h:115 > #5 PipeConnection::try_get_pipe (this=0x49c38b00, p=0x7fceaa4e2c60) > at msg/simple/PipeConnection.cc:38 > #6 0x0000000000db52dc in SimpleMessenger::submit_message > (this=0x5aa8e00, m=0x2c6d0b40, con=0x49c38b00, dest_addr=..., > dest_type=4, already_locked=false) > at msg/simple/SimpleMessenger.cc:443 > #7 0x0000000000db5da7 in SimpleMessenger::_send_message > (this=0x5aa8e00, m=0x2c6d0b40, con=0x49c38b00) at > msg/simple/SimpleMessenger.cc:136 > #8 0x00000000007a27e0 in OSDService::send_incremental_map > (this=0x5bb16c8, since=58626, con=0x49c38b00, osdmap=Unhandled dwarf > expression opcode 0xf3 > ) at osd/OSD.cc:1087 > #9 0x00000000007a2a89 in OSDService::share_map_peer (this=0x5bb16c8, > peer=28, con=0x49c38b00, map=std::tr1::shared_ptr (count 562) > 0x5b367500) at osd/OSD.cc:887 > #10 0x00000000007a8324 in OSD::handle_osd_ping (this=0x5bb0000, > m=0x6861f600) at osd/OSD.cc:3599 > #11 0x00000000007a8b2b in OSD::heartbeat_dispatch (this=0x5bb0000, > m=0x6861f600) at osd/OSD.cc:5254 > #12 0x0000000000de07ab in ms_deliver_dispatch (this=0x5aa9dc8) at > msg/Messenger.h:567 > #13 DispatchQueue::entry (this=0x5aa9dc8) at msg/simple/DispatchQueue.cc:185 > #14 0x0000000000db86dd in DispatchQueue::DispatchThread::entry > (this=Unhandled dwarf expression opcode 0xf3 > ) at msg/simple/DispatchQueue.h:103 > #15 0x00000032a2607aa1 in start_thread () from /lib64/libpthread.so.0 > #16 0x00000032a22e893d in clone () from /lib64/libc.so.6 > (gdb) frame 10 > #10 0x00000000007a8324 in OSD::handle_osd_ping (this=0x5bb0000, > m=0x6861f600) at osd/OSD.cc:3599 > 3599 service.share_map_peer(from, con.get()); > (gdb) p &heartbeat_lock > $6 = (Mutex *) 0x5bb0ce8 > (gdb) frame 5 > #5 PipeConnection::try_get_pipe (this=0x49c38b00, p=0x7fceaa4e2c60) > at msg/simple/PipeConnection.cc:38 > 38 Mutex::Locker l(lock); > (gdb) p &lock > $8 = (Mutex *) 0x49c38b18 > (gdb) > (gdb) thread 286 > [Switching to thread 286 (Thread 0x7fce7a9d9700 (LWP 15873))]#0 > 0x00000032a260ec2c in recv () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x00000032a260ec2c in recv () from /lib64/libpthread.so.0 > #1 0x0000000000dc8724 in recv (this=0x52fec800, > buf=0xe27c000 > "\235\302óH[\002\220R1\035\251\334,-(\353L\rJ\247\232i\210._j\001\375\357\351U\211⤸}\322}8\250\\\374\325W\020f\237\037\314\300Ty\251-\307%\277\ni\205\213\031\245\024\314҂j\223\025\207\212Z`4\340E5!\016\006\227\064\320h\334*\224\204\061\037r\017ʍ\325\f/\202G\257\065&j%\"\222\027\064\204\322n\244\315K\220\354.i\255\206\004z\320M&j\\\200\256x␚|\343\r\237_\347Q\023I\261\216\315Y\202M\361\217Q\301\252\224\370$\330\370\354\334T\261\226\263E%!\245pC\262)\r%\031\244\061\262(u*{\325B\n\222\017QW*+\230\262\067\216\335~\224"..., > len=4096, flags=64) at /usr/include/bits/socket2.h:45 > #2 Pipe::do_recv (this=0x52fec800, > buf=0xe27c000 > "\235\302óH[\002\220R1\035\251\334,-(\353L\rJ\247\232i\210._j\001\375\357\351U\211⤸}\322}8\250\\\374\325W\020f\237\037\314\300Ty\251-\307%\277\ni\205\213\031\245\024\314҂j\223\025\207\212Z`4\340E5!\016\006\227\064\320h\334*\224\204\061\037r\017ʍ\325\f/\202G\257\065&j%\"\222\027\064\204\322n\244\315K\220\354.i\255\206\004z\320M&j\\\200\256x␚|\343\r\237_\347Q\023I\261\216\315Y\202M\361\217Q\301\252\224\370$\330\370\354\334T\261\226\263E%!\245pC\262)\r%\031\244\061\262(u*{\325B\n\222\017QW*+\230\262\067\216\335~\224"..., > len=4096, flags=64) at msg/simple/Pipe.cc:2430 > #3 0x0000000000dc8990 in Pipe::buffered_recv (this=0x52fec800, > buf=0x4d29e3e1 "", len=3823, flags=Unhandled dwarf expression opcode > 0xf3 > ) at msg/simple/Pipe.cc:2476 > #4 0x0000000000dc8a90 in Pipe::tcp_read_nonblocking (this=0x52fec800, > buf=Unhandled dwarf expression opcode 0xf3 > ) at msg/simple/Pipe.cc:2494 > #5 0x0000000000dc9b36 in Pipe::read_message (this=0x52fec800, > pm=0x7fce7a9d8b98, auth_handler=0x5d22ef60) at msg/simple/Pipe.cc:2034 > #6 0x0000000000ddbcc7 in Pipe::reader (this=0x52fec800) at > msg/simple/Pipe.cc:1581 > #7 0x0000000000dded6d in Pipe::Reader::entry (this=Unhandled dwarf > expression opcode 0xf3 > ) at msg/simple/Pipe.h:50 > #8 0x00000032a2607aa1 in start_thread () from /lib64/libpthread.so.0 > #9 0x00000032a22e893d in clone () from /lib64/libc.so.6 > (gdb) frame 5 > #5 0x0000000000dc9b36 in Pipe::read_message (this=0x52fec800, > pm=0x7fce7a9d8b98, auth_handler=0x5d22ef60) at msg/simple/Pipe.cc:2034 > 2034 int got = tcp_read_nonblocking(bp.c_str(), read); > (gdb) p connection_state > $3 = {px = 0x49c38b00} > (gdb) p &((PipeConnection*)0x49c38b00)->lock > $5 = (Mutex *) 0x49c38b18 > (gdb) p this->port > $11 = 63805 > (gdb) p this->peer_type > $12 = 4 > (gdb) printf "%d.%d.%d.%d\n", > this->peer_addr.addr4.sin_addr.s_addr&0xFF, > this->peer_addr.addr4.sin_addr.s_addr>>8&0xFF , > this->peer_addr.addr4.sin_addr.s_addr>>16&0xFF, > this->peer_addr.addr4.sin_addr.s_addr>>24&0xFF > 192.168.30.105 > (gdb) printf "%d\n", (this->peer_addr.addr4.sin_port<<8&0xFF00) | > (this->peer_addr.addr4.sin_port>>8&0x00FF) > 6805 > > > [root@192.168.30.41 ~]# lsof -i :63805 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > ceph-osd 29007 root 647u IPv4 150861496 0t0 TCP > 192.168.30.41:63805->192.168.30.105:6805 (ESTABLISHED) > > [root@192.168.30.105 ~]# netstat -nap | grep 63805 > [root@192.168.30.105 ~]# > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thank you! HuangJun -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html