Re: Wierd deadlock in OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi, you can create an issue on website:
http://tracker.ceph.com/projects/ceph/issues
we have notice some osd_tp timeout suicide situation, maybe have the
same reason with your post.

2016-09-27 9:47 GMT+08:00 Jeegn Chen <jeegnchen@xxxxxxxxx>:
> Hi all,
>
> I noticed a deadlock phenomenon in Hammer OSD (Strictly speaking,
> starvation related to locks may be more precise since there is no lock
> circle).
> Although I found the threads locking each other, I have not figured
> out why they came to the state. Any hint will be appreciate.
>
> As you see in the following inspection through gdb
> * thread 11 has taken osd_lock but is waiting for heartbeat_lock
> * thread 37 has taken heartbeat_lock but is waiting for a PipeConnection's lock
> * thread 286 has taken the PipeConnection's lock needed by thread 37,
> but it never returns from Pipe::tcp_read_nonblocking() to
> Pipe::read_message(); As a result, the PipeConnection's lock is never
> released by Pipe::read_message()
> * Since osd_lock and heartbeat_lock are reserved, most of the OSD's
> lock cannot move ahead and looks deadlock
>
> More detailed check on thread 286 shows that,EAGAIN error is always
> received when recv() is invoked in Pipe::do_recv(). Thus,
> Pipe::do_recv() logic retries forever, which is why
> Pipe::tcp_read_nonblocking() cannot finish.
>
> I examined the Pipe object with gdb  and know it is a connection from
> 192.168.30.41:63805 to 192.168.30.105:6805. So I used "lsof" and
> "netstat" to check both servers (See the result following the gdb
> content). It seems that 192.168.30.41:63805->192.168.30.105:6805 is
> still ESTABILISHED on 192.168.30.41 but the TCP connection does not
> exist in 192.168.30.105's perspective.
> My understanding is that even if a connection is half-closed on one
> side, recv() on the other side should return 0 and give no error. But
> why does recv() give EAGAIN constantly?
>
> Any idea on how this happen or any explanation? Some bug in kernel TCP
> stack (2.6.32)?
>
>
> (gdb) thread 11
> [Switching to thread 11 (Thread 0x7fcebcd2f700 (LWP 29061))]#0
> 0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00000032a26095d8 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x00000032a26094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x0000000000c0ba3b in Mutex::Lock (this=0x5bb0ce8,
> no_lockdep=Unhandled dwarf expression opcode 0xf3) at
> common/Mutex.cc:91
> #4  0x0000000000795c29 in Locker (this=0x5bb0000) at common/Mutex.h:115
> #5  OSD::maybe_update_heartbeat_peers (this=0x5bb0000) at osd/OSD.cc:3432
> #6  0x00000000007bad13 in OSD::tick (this=0x5bb0000) at osd/OSD.cc:3863
> #7  0x00000000007cb299 in Context::complete (this=0x5988a60,
> r=Unhandled dwarf expression opcode 0xf3) at include/Context.h:65
> #8  0x0000000000c21bbf in SafeTimer::timer_thread (this=0x5bb0070) at
> common/Timer.cc:105
> #9  0x0000000000c2317d in SafeTimerThread::entry (this=Unhandled dwarf
> expression opcode 0xf3) at common/Timer.cc:38
> #10 0x00000032a2607aa1 in start_thread () from /lib64/libpthread.so.0
> #11 0x00000032a22e893d in clone () from /lib64/libc.so.6
> (gdb) frame 8
> #8  0x0000000000c21bbf in SafeTimer::timer_thread (this=0x5bb0070) at
> common/Timer.cc:105
> 105           callback->complete(0);
> (gdb) p &lock
> $1 = (Mutex *) 0x5bb0018
> (gdb) frame 5
> #5  OSD::maybe_update_heartbeat_peers (this=0x5bb0000) at osd/OSD.cc:3432
> 3432      Mutex::Locker l(heartbeat_lock);
> (gdb) p &heartbeat_lock
> $2 = (Mutex *) 0x5bb0ce8
>
> (gdb) thread 37
> [Switching to thread 37 (Thread 0x7fceaa4e4700 (LWP 29763))]#0
> 0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00000032a26095d8 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x00000032a26094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x0000000000c0ba3b in Mutex::Lock (this=0x49c38b18,
> no_lockdep=Unhandled dwarf expression opcode 0xf3
> ) at common/Mutex.cc:91
> #4  0x0000000000ddeecb in Locker (this=0x49c38b00, p=0x7fceaa4e2c60)
> at common/Mutex.h:115
> #5  PipeConnection::try_get_pipe (this=0x49c38b00, p=0x7fceaa4e2c60)
> at msg/simple/PipeConnection.cc:38
> #6  0x0000000000db52dc in SimpleMessenger::submit_message
> (this=0x5aa8e00, m=0x2c6d0b40, con=0x49c38b00, dest_addr=...,
> dest_type=4, already_locked=false)
>     at msg/simple/SimpleMessenger.cc:443
> #7  0x0000000000db5da7 in SimpleMessenger::_send_message
> (this=0x5aa8e00, m=0x2c6d0b40, con=0x49c38b00) at
> msg/simple/SimpleMessenger.cc:136
> #8  0x00000000007a27e0 in OSDService::send_incremental_map
> (this=0x5bb16c8, since=58626, con=0x49c38b00, osdmap=Unhandled dwarf
> expression opcode 0xf3
> ) at osd/OSD.cc:1087
> #9  0x00000000007a2a89 in OSDService::share_map_peer (this=0x5bb16c8,
> peer=28, con=0x49c38b00, map=std::tr1::shared_ptr (count 562)
> 0x5b367500) at osd/OSD.cc:887
> #10 0x00000000007a8324 in OSD::handle_osd_ping (this=0x5bb0000,
> m=0x6861f600) at osd/OSD.cc:3599
> #11 0x00000000007a8b2b in OSD::heartbeat_dispatch (this=0x5bb0000,
> m=0x6861f600) at osd/OSD.cc:5254
> #12 0x0000000000de07ab in ms_deliver_dispatch (this=0x5aa9dc8) at
> msg/Messenger.h:567
> #13 DispatchQueue::entry (this=0x5aa9dc8) at msg/simple/DispatchQueue.cc:185
> #14 0x0000000000db86dd in DispatchQueue::DispatchThread::entry
> (this=Unhandled dwarf expression opcode 0xf3
> ) at msg/simple/DispatchQueue.h:103
> #15 0x00000032a2607aa1 in start_thread () from /lib64/libpthread.so.0
> #16 0x00000032a22e893d in clone () from /lib64/libc.so.6
> (gdb) frame 10
> #10 0x00000000007a8324 in OSD::handle_osd_ping (this=0x5bb0000,
> m=0x6861f600) at osd/OSD.cc:3599
> 3599                service.share_map_peer(from, con.get());
> (gdb) p &heartbeat_lock
> $6 = (Mutex *) 0x5bb0ce8
> (gdb) frame 5
> #5  PipeConnection::try_get_pipe (this=0x49c38b00, p=0x7fceaa4e2c60)
> at msg/simple/PipeConnection.cc:38
> 38        Mutex::Locker l(lock);
> (gdb) p &lock
> $8 = (Mutex *) 0x49c38b18
> (gdb)
> (gdb) thread 286
> [Switching to thread 286 (Thread 0x7fce7a9d9700 (LWP 15873))]#0
> 0x00000032a260ec2c in recv () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00000032a260ec2c in recv () from /lib64/libpthread.so.0
> #1  0x0000000000dc8724 in recv (this=0x52fec800,
>     buf=0xe27c000
> "\235\302óH[\002\220R1\035\251\334,-(\353L\rJ\247\232i\210._j\001\375\357\351U\211⤸}\322}8\250\\\374\325W\020f\237\037\314\300Ty\251-\307%\277\ni\205\213\031\245\024\314҂j\223\025\207\212Z`4\340E5!\016\006\227\064\320h\334*\224\204\061\037r\017ʍ\325\f/\202G\257\065&j%\"\222\027\064\204\322n\244\315K\220\354.i\255\206\004z\320M&j\\\200\256x␚|\343\r\237_\347Q\023I\261\216\315Y\202M\361\217Q\301\252\224\370$\330\370\354\334T\261\226\263E%!\245pC\262)\r%\031\244\061\262(u*{\325B\n\222\017QW*+\230\262\067\216\335~\224"...,
> len=4096, flags=64) at /usr/include/bits/socket2.h:45
> #2  Pipe::do_recv (this=0x52fec800,
>     buf=0xe27c000
> "\235\302óH[\002\220R1\035\251\334,-(\353L\rJ\247\232i\210._j\001\375\357\351U\211⤸}\322}8\250\\\374\325W\020f\237\037\314\300Ty\251-\307%\277\ni\205\213\031\245\024\314҂j\223\025\207\212Z`4\340E5!\016\006\227\064\320h\334*\224\204\061\037r\017ʍ\325\f/\202G\257\065&j%\"\222\027\064\204\322n\244\315K\220\354.i\255\206\004z\320M&j\\\200\256x␚|\343\r\237_\347Q\023I\261\216\315Y\202M\361\217Q\301\252\224\370$\330\370\354\334T\261\226\263E%!\245pC\262)\r%\031\244\061\262(u*{\325B\n\222\017QW*+\230\262\067\216\335~\224"...,
> len=4096, flags=64) at msg/simple/Pipe.cc:2430
> #3  0x0000000000dc8990 in Pipe::buffered_recv (this=0x52fec800,
> buf=0x4d29e3e1 "", len=3823, flags=Unhandled dwarf expression opcode
> 0xf3
> ) at msg/simple/Pipe.cc:2476
> #4  0x0000000000dc8a90 in Pipe::tcp_read_nonblocking (this=0x52fec800,
> buf=Unhandled dwarf expression opcode 0xf3
> ) at msg/simple/Pipe.cc:2494
> #5  0x0000000000dc9b36 in Pipe::read_message (this=0x52fec800,
> pm=0x7fce7a9d8b98, auth_handler=0x5d22ef60) at msg/simple/Pipe.cc:2034
> #6  0x0000000000ddbcc7 in Pipe::reader (this=0x52fec800) at
> msg/simple/Pipe.cc:1581
> #7  0x0000000000dded6d in Pipe::Reader::entry (this=Unhandled dwarf
> expression opcode 0xf3
> ) at msg/simple/Pipe.h:50
> #8  0x00000032a2607aa1 in start_thread () from /lib64/libpthread.so.0
> #9  0x00000032a22e893d in clone () from /lib64/libc.so.6
> (gdb) frame 5
> #5  0x0000000000dc9b36 in Pipe::read_message (this=0x52fec800,
> pm=0x7fce7a9d8b98, auth_handler=0x5d22ef60) at msg/simple/Pipe.cc:2034
> 2034          int got = tcp_read_nonblocking(bp.c_str(), read);
> (gdb) p connection_state
> $3 = {px = 0x49c38b00}
> (gdb) p &((PipeConnection*)0x49c38b00)->lock
> $5 = (Mutex *) 0x49c38b18
> (gdb) p this->port
> $11 = 63805
> (gdb) p this->peer_type
> $12 = 4
> (gdb) printf "%d.%d.%d.%d\n",
> this->peer_addr.addr4.sin_addr.s_addr&0xFF,
> this->peer_addr.addr4.sin_addr.s_addr>>8&0xFF ,
> this->peer_addr.addr4.sin_addr.s_addr>>16&0xFF,
> this->peer_addr.addr4.sin_addr.s_addr>>24&0xFF
> 192.168.30.105
> (gdb) printf "%d\n", (this->peer_addr.addr4.sin_port<<8&0xFF00) |
> (this->peer_addr.addr4.sin_port>>8&0x00FF)
> 6805
>
>
> [root@192.168.30.41 ~]# lsof -i :63805
> COMMAND    PID USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
> ceph-osd 29007 root  647u  IPv4 150861496      0t0  TCP
> 192.168.30.41:63805->192.168.30.105:6805 (ESTABLISHED)
>
> [root@192.168.30.105 ~]# netstat -nap | grep 63805
> [root@192.168.30.105 ~]#
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Thank you!
HuangJun
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux