Wierd deadlock in OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I noticed a deadlock phenomenon in Hammer OSD (Strictly speaking,
starvation related to locks may be more precise since there is no lock
circle).
Although I found the threads locking each other, I have not figured
out why they came to the state. Any hint will be appreciate.

As you see in the following inspection through gdb
* thread 11 has taken osd_lock but is waiting for heartbeat_lock
* thread 37 has taken heartbeat_lock but is waiting for a PipeConnection's lock
* thread 286 has taken the PipeConnection's lock needed by thread 37,
but it never returns from Pipe::tcp_read_nonblocking() to
Pipe::read_message(); As a result, the PipeConnection's lock is never
released by Pipe::read_message()
* Since osd_lock and heartbeat_lock are reserved, most of the OSD's
lock cannot move ahead and looks deadlock

More detailed check on thread 286 shows that,EAGAIN error is always
received when recv() is invoked in Pipe::do_recv(). Thus,
Pipe::do_recv() logic retries forever, which is why
Pipe::tcp_read_nonblocking() cannot finish.

I examined the Pipe object with gdb  and know it is a connection from
192.168.30.41:63805 to 192.168.30.105:6805. So I used "lsof" and
"netstat" to check both servers (See the result following the gdb
content). It seems that 192.168.30.41:63805->192.168.30.105:6805 is
still ESTABILISHED on 192.168.30.41 but the TCP connection does not
exist in 192.168.30.105's perspective.
My understanding is that even if a connection is half-closed on one
side, recv() on the other side should return 0 and give no error. But
why does recv() give EAGAIN constantly?

Any idea on how this happen or any explanation? Some bug in kernel TCP
stack (2.6.32)?


(gdb) thread 11
[Switching to thread 11 (Thread 0x7fcebcd2f700 (LWP 29061))]#0
0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00000032a26095d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00000032a26094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000c0ba3b in Mutex::Lock (this=0x5bb0ce8,
no_lockdep=Unhandled dwarf expression opcode 0xf3) at
common/Mutex.cc:91
#4  0x0000000000795c29 in Locker (this=0x5bb0000) at common/Mutex.h:115
#5  OSD::maybe_update_heartbeat_peers (this=0x5bb0000) at osd/OSD.cc:3432
#6  0x00000000007bad13 in OSD::tick (this=0x5bb0000) at osd/OSD.cc:3863
#7  0x00000000007cb299 in Context::complete (this=0x5988a60,
r=Unhandled dwarf expression opcode 0xf3) at include/Context.h:65
#8  0x0000000000c21bbf in SafeTimer::timer_thread (this=0x5bb0070) at
common/Timer.cc:105
#9  0x0000000000c2317d in SafeTimerThread::entry (this=Unhandled dwarf
expression opcode 0xf3) at common/Timer.cc:38
#10 0x00000032a2607aa1 in start_thread () from /lib64/libpthread.so.0
#11 0x00000032a22e893d in clone () from /lib64/libc.so.6
(gdb) frame 8
#8  0x0000000000c21bbf in SafeTimer::timer_thread (this=0x5bb0070) at
common/Timer.cc:105
105           callback->complete(0);
(gdb) p &lock
$1 = (Mutex *) 0x5bb0018
(gdb) frame 5
#5  OSD::maybe_update_heartbeat_peers (this=0x5bb0000) at osd/OSD.cc:3432
3432      Mutex::Locker l(heartbeat_lock);
(gdb) p &heartbeat_lock
$2 = (Mutex *) 0x5bb0ce8

(gdb) thread 37
[Switching to thread 37 (Thread 0x7fceaa4e4700 (LWP 29763))]#0
0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00000032a26095d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00000032a26094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000c0ba3b in Mutex::Lock (this=0x49c38b18,
no_lockdep=Unhandled dwarf expression opcode 0xf3
) at common/Mutex.cc:91
#4  0x0000000000ddeecb in Locker (this=0x49c38b00, p=0x7fceaa4e2c60)
at common/Mutex.h:115
#5  PipeConnection::try_get_pipe (this=0x49c38b00, p=0x7fceaa4e2c60)
at msg/simple/PipeConnection.cc:38
#6  0x0000000000db52dc in SimpleMessenger::submit_message
(this=0x5aa8e00, m=0x2c6d0b40, con=0x49c38b00, dest_addr=...,
dest_type=4, already_locked=false)
    at msg/simple/SimpleMessenger.cc:443
#7  0x0000000000db5da7 in SimpleMessenger::_send_message
(this=0x5aa8e00, m=0x2c6d0b40, con=0x49c38b00) at
msg/simple/SimpleMessenger.cc:136
#8  0x00000000007a27e0 in OSDService::send_incremental_map
(this=0x5bb16c8, since=58626, con=0x49c38b00, osdmap=Unhandled dwarf
expression opcode 0xf3
) at osd/OSD.cc:1087
#9  0x00000000007a2a89 in OSDService::share_map_peer (this=0x5bb16c8,
peer=28, con=0x49c38b00, map=std::tr1::shared_ptr (count 562)
0x5b367500) at osd/OSD.cc:887
#10 0x00000000007a8324 in OSD::handle_osd_ping (this=0x5bb0000,
m=0x6861f600) at osd/OSD.cc:3599
#11 0x00000000007a8b2b in OSD::heartbeat_dispatch (this=0x5bb0000,
m=0x6861f600) at osd/OSD.cc:5254
#12 0x0000000000de07ab in ms_deliver_dispatch (this=0x5aa9dc8) at
msg/Messenger.h:567
#13 DispatchQueue::entry (this=0x5aa9dc8) at msg/simple/DispatchQueue.cc:185
#14 0x0000000000db86dd in DispatchQueue::DispatchThread::entry
(this=Unhandled dwarf expression opcode 0xf3
) at msg/simple/DispatchQueue.h:103
#15 0x00000032a2607aa1 in start_thread () from /lib64/libpthread.so.0
#16 0x00000032a22e893d in clone () from /lib64/libc.so.6
(gdb) frame 10
#10 0x00000000007a8324 in OSD::handle_osd_ping (this=0x5bb0000,
m=0x6861f600) at osd/OSD.cc:3599
3599                service.share_map_peer(from, con.get());
(gdb) p &heartbeat_lock
$6 = (Mutex *) 0x5bb0ce8
(gdb) frame 5
#5  PipeConnection::try_get_pipe (this=0x49c38b00, p=0x7fceaa4e2c60)
at msg/simple/PipeConnection.cc:38
38        Mutex::Locker l(lock);
(gdb) p &lock
$8 = (Mutex *) 0x49c38b18
(gdb)
(gdb) thread 286
[Switching to thread 286 (Thread 0x7fce7a9d9700 (LWP 15873))]#0
0x00000032a260ec2c in recv () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00000032a260ec2c in recv () from /lib64/libpthread.so.0
#1  0x0000000000dc8724 in recv (this=0x52fec800,
    buf=0xe27c000
"\235\302óH[\002\220R1\035\251\334,-(\353L\rJ\247\232i\210._j\001\375\357\351U\211⤸}\322}8\250\\\374\325W\020f\237\037\314\300Ty\251-\307%\277\ni\205\213\031\245\024\314҂j\223\025\207\212Z`4\340E5!\016\006\227\064\320h\334*\224\204\061\037r\017ʍ\325\f/\202G\257\065&j%\"\222\027\064\204\322n\244\315K\220\354.i\255\206\004z\320M&j\\\200\256x␚|\343\r\237_\347Q\023I\261\216\315Y\202M\361\217Q\301\252\224\370$\330\370\354\334T\261\226\263E%!\245pC\262)\r%\031\244\061\262(u*{\325B\n\222\017QW*+\230\262\067\216\335~\224"...,
len=4096, flags=64) at /usr/include/bits/socket2.h:45
#2  Pipe::do_recv (this=0x52fec800,
    buf=0xe27c000
"\235\302óH[\002\220R1\035\251\334,-(\353L\rJ\247\232i\210._j\001\375\357\351U\211⤸}\322}8\250\\\374\325W\020f\237\037\314\300Ty\251-\307%\277\ni\205\213\031\245\024\314҂j\223\025\207\212Z`4\340E5!\016\006\227\064\320h\334*\224\204\061\037r\017ʍ\325\f/\202G\257\065&j%\"\222\027\064\204\322n\244\315K\220\354.i\255\206\004z\320M&j\\\200\256x␚|\343\r\237_\347Q\023I\261\216\315Y\202M\361\217Q\301\252\224\370$\330\370\354\334T\261\226\263E%!\245pC\262)\r%\031\244\061\262(u*{\325B\n\222\017QW*+\230\262\067\216\335~\224"...,
len=4096, flags=64) at msg/simple/Pipe.cc:2430
#3  0x0000000000dc8990 in Pipe::buffered_recv (this=0x52fec800,
buf=0x4d29e3e1 "", len=3823, flags=Unhandled dwarf expression opcode
0xf3
) at msg/simple/Pipe.cc:2476
#4  0x0000000000dc8a90 in Pipe::tcp_read_nonblocking (this=0x52fec800,
buf=Unhandled dwarf expression opcode 0xf3
) at msg/simple/Pipe.cc:2494
#5  0x0000000000dc9b36 in Pipe::read_message (this=0x52fec800,
pm=0x7fce7a9d8b98, auth_handler=0x5d22ef60) at msg/simple/Pipe.cc:2034
#6  0x0000000000ddbcc7 in Pipe::reader (this=0x52fec800) at
msg/simple/Pipe.cc:1581
#7  0x0000000000dded6d in Pipe::Reader::entry (this=Unhandled dwarf
expression opcode 0xf3
) at msg/simple/Pipe.h:50
#8  0x00000032a2607aa1 in start_thread () from /lib64/libpthread.so.0
#9  0x00000032a22e893d in clone () from /lib64/libc.so.6
(gdb) frame 5
#5  0x0000000000dc9b36 in Pipe::read_message (this=0x52fec800,
pm=0x7fce7a9d8b98, auth_handler=0x5d22ef60) at msg/simple/Pipe.cc:2034
2034          int got = tcp_read_nonblocking(bp.c_str(), read);
(gdb) p connection_state
$3 = {px = 0x49c38b00}
(gdb) p &((PipeConnection*)0x49c38b00)->lock
$5 = (Mutex *) 0x49c38b18
(gdb) p this->port
$11 = 63805
(gdb) p this->peer_type
$12 = 4
(gdb) printf "%d.%d.%d.%d\n",
this->peer_addr.addr4.sin_addr.s_addr&0xFF,
this->peer_addr.addr4.sin_addr.s_addr>>8&0xFF ,
this->peer_addr.addr4.sin_addr.s_addr>>16&0xFF,
this->peer_addr.addr4.sin_addr.s_addr>>24&0xFF
192.168.30.105
(gdb) printf "%d\n", (this->peer_addr.addr4.sin_port<<8&0xFF00) |
(this->peer_addr.addr4.sin_port>>8&0x00FF)
6805


[root@192.168.30.41 ~]# lsof -i :63805
COMMAND    PID USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
ceph-osd 29007 root  647u  IPv4 150861496      0t0  TCP
192.168.30.41:63805->192.168.30.105:6805 (ESTABLISHED)

[root@192.168.30.105 ~]# netstat -nap | grep 63805
[root@192.168.30.105 ~]#
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux