Re: Wierd deadlock in OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

http://tracker.ceph.com/issues/17411 has been created

Thanks,
Jeegn

2016-09-27 10:00 GMT+08:00 huang jun <hjwsm1989@xxxxxxxxx>:
> hi, you can create an issue on website:
> http://tracker.ceph.com/projects/ceph/issues
> we have notice some osd_tp timeout suicide situation, maybe have the
> same reason with your post.
>
> 2016-09-27 9:47 GMT+08:00 Jeegn Chen <jeegnchen@xxxxxxxxx>:
>> Hi all,
>>
>> I noticed a deadlock phenomenon in Hammer OSD (Strictly speaking,
>> starvation related to locks may be more precise since there is no lock
>> circle).
>> Although I found the threads locking each other, I have not figured
>> out why they came to the state. Any hint will be appreciate.
>>
>> As you see in the following inspection through gdb
>> * thread 11 has taken osd_lock but is waiting for heartbeat_lock
>> * thread 37 has taken heartbeat_lock but is waiting for a PipeConnection's lock
>> * thread 286 has taken the PipeConnection's lock needed by thread 37,
>> but it never returns from Pipe::tcp_read_nonblocking() to
>> Pipe::read_message(); As a result, the PipeConnection's lock is never
>> released by Pipe::read_message()
>> * Since osd_lock and heartbeat_lock are reserved, most of the OSD's
>> lock cannot move ahead and looks deadlock
>>
>> More detailed check on thread 286 shows that,EAGAIN error is always
>> received when recv() is invoked in Pipe::do_recv(). Thus,
>> Pipe::do_recv() logic retries forever, which is why
>> Pipe::tcp_read_nonblocking() cannot finish.
>>
>> I examined the Pipe object with gdb  and know it is a connection from
>> 192.168.30.41:63805 to 192.168.30.105:6805. So I used "lsof" and
>> "netstat" to check both servers (See the result following the gdb
>> content). It seems that 192.168.30.41:63805->192.168.30.105:6805 is
>> still ESTABILISHED on 192.168.30.41 but the TCP connection does not
>> exist in 192.168.30.105's perspective.
>> My understanding is that even if a connection is half-closed on one
>> side, recv() on the other side should return 0 and give no error. But
>> why does recv() give EAGAIN constantly?
>>
>> Any idea on how this happen or any explanation? Some bug in kernel TCP
>> stack (2.6.32)?
>>
>>
>> (gdb) thread 11
>> [Switching to thread 11 (Thread 0x7fcebcd2f700 (LWP 29061))]#0
>> 0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0
>> (gdb) bt
>> #0  0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0
>> #1  0x00000032a26095d8 in _L_lock_854 () from /lib64/libpthread.so.0
>> #2  0x00000032a26094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
>> #3  0x0000000000c0ba3b in Mutex::Lock (this=0x5bb0ce8,
>> no_lockdep=Unhandled dwarf expression opcode 0xf3) at
>> common/Mutex.cc:91
>> #4  0x0000000000795c29 in Locker (this=0x5bb0000) at common/Mutex.h:115
>> #5  OSD::maybe_update_heartbeat_peers (this=0x5bb0000) at osd/OSD.cc:3432
>> #6  0x00000000007bad13 in OSD::tick (this=0x5bb0000) at osd/OSD.cc:3863
>> #7  0x00000000007cb299 in Context::complete (this=0x5988a60,
>> r=Unhandled dwarf expression opcode 0xf3) at include/Context.h:65
>> #8  0x0000000000c21bbf in SafeTimer::timer_thread (this=0x5bb0070) at
>> common/Timer.cc:105
>> #9  0x0000000000c2317d in SafeTimerThread::entry (this=Unhandled dwarf
>> expression opcode 0xf3) at common/Timer.cc:38
>> #10 0x00000032a2607aa1 in start_thread () from /lib64/libpthread.so.0
>> #11 0x00000032a22e893d in clone () from /lib64/libc.so.6
>> (gdb) frame 8
>> #8  0x0000000000c21bbf in SafeTimer::timer_thread (this=0x5bb0070) at
>> common/Timer.cc:105
>> 105           callback->complete(0);
>> (gdb) p &lock
>> $1 = (Mutex *) 0x5bb0018
>> (gdb) frame 5
>> #5  OSD::maybe_update_heartbeat_peers (this=0x5bb0000) at osd/OSD.cc:3432
>> 3432      Mutex::Locker l(heartbeat_lock);
>> (gdb) p &heartbeat_lock
>> $2 = (Mutex *) 0x5bb0ce8
>>
>> (gdb) thread 37
>> [Switching to thread 37 (Thread 0x7fceaa4e4700 (LWP 29763))]#0
>> 0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0
>> (gdb) bt
>> #0  0x00000032a260e334 in __lll_lock_wait () from /lib64/libpthread.so.0
>> #1  0x00000032a26095d8 in _L_lock_854 () from /lib64/libpthread.so.0
>> #2  0x00000032a26094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
>> #3  0x0000000000c0ba3b in Mutex::Lock (this=0x49c38b18,
>> no_lockdep=Unhandled dwarf expression opcode 0xf3
>> ) at common/Mutex.cc:91
>> #4  0x0000000000ddeecb in Locker (this=0x49c38b00, p=0x7fceaa4e2c60)
>> at common/Mutex.h:115
>> #5  PipeConnection::try_get_pipe (this=0x49c38b00, p=0x7fceaa4e2c60)
>> at msg/simple/PipeConnection.cc:38
>> #6  0x0000000000db52dc in SimpleMessenger::submit_message
>> (this=0x5aa8e00, m=0x2c6d0b40, con=0x49c38b00, dest_addr=...,
>> dest_type=4, already_locked=false)
>>     at msg/simple/SimpleMessenger.cc:443
>> #7  0x0000000000db5da7 in SimpleMessenger::_send_message
>> (this=0x5aa8e00, m=0x2c6d0b40, con=0x49c38b00) at
>> msg/simple/SimpleMessenger.cc:136
>> #8  0x00000000007a27e0 in OSDService::send_incremental_map
>> (this=0x5bb16c8, since=58626, con=0x49c38b00, osdmap=Unhandled dwarf
>> expression opcode 0xf3
>> ) at osd/OSD.cc:1087
>> #9  0x00000000007a2a89 in OSDService::share_map_peer (this=0x5bb16c8,
>> peer=28, con=0x49c38b00, map=std::tr1::shared_ptr (count 562)
>> 0x5b367500) at osd/OSD.cc:887
>> #10 0x00000000007a8324 in OSD::handle_osd_ping (this=0x5bb0000,
>> m=0x6861f600) at osd/OSD.cc:3599
>> #11 0x00000000007a8b2b in OSD::heartbeat_dispatch (this=0x5bb0000,
>> m=0x6861f600) at osd/OSD.cc:5254
>> #12 0x0000000000de07ab in ms_deliver_dispatch (this=0x5aa9dc8) at
>> msg/Messenger.h:567
>> #13 DispatchQueue::entry (this=0x5aa9dc8) at msg/simple/DispatchQueue.cc:185
>> #14 0x0000000000db86dd in DispatchQueue::DispatchThread::entry
>> (this=Unhandled dwarf expression opcode 0xf3
>> ) at msg/simple/DispatchQueue.h:103
>> #15 0x00000032a2607aa1 in start_thread () from /lib64/libpthread.so.0
>> #16 0x00000032a22e893d in clone () from /lib64/libc.so.6
>> (gdb) frame 10
>> #10 0x00000000007a8324 in OSD::handle_osd_ping (this=0x5bb0000,
>> m=0x6861f600) at osd/OSD.cc:3599
>> 3599                service.share_map_peer(from, con.get());
>> (gdb) p &heartbeat_lock
>> $6 = (Mutex *) 0x5bb0ce8
>> (gdb) frame 5
>> #5  PipeConnection::try_get_pipe (this=0x49c38b00, p=0x7fceaa4e2c60)
>> at msg/simple/PipeConnection.cc:38
>> 38        Mutex::Locker l(lock);
>> (gdb) p &lock
>> $8 = (Mutex *) 0x49c38b18
>> (gdb)
>> (gdb) thread 286
>> [Switching to thread 286 (Thread 0x7fce7a9d9700 (LWP 15873))]#0
>> 0x00000032a260ec2c in recv () from /lib64/libpthread.so.0
>> (gdb) bt
>> #0  0x00000032a260ec2c in recv () from /lib64/libpthread.so.0
>> #1  0x0000000000dc8724 in recv (this=0x52fec800,
>>     buf=0xe27c000
>> "\235\302óH[\002\220R1\035\251\334,-(\353L\rJ\247\232i\210._j\001\375\357\351U\211⤸}\322}8\250\\\374\325W\020f\237\037\314\300Ty\251-\307%\277\ni\205\213\031\245\024\314҂j\223\025\207\212Z`4\340E5!\016\006\227\064\320h\334*\224\204\061\037r\017ʍ\325\f/\202G\257\065&j%\"\222\027\064\204\322n\244\315K\220\354.i\255\206\004z\320M&j\\\200\256x␚|\343\r\237_\347Q\023I\261\216\315Y\202M\361\217Q\301\252\224\370$\330\370\354\334T\261\226\263E%!\245pC\262)\r%\031\244\061\262(u*{\325B\n\222\017QW*+\230\262\067\216\335~\224"...,
>> len=4096, flags=64) at /usr/include/bits/socket2.h:45
>> #2  Pipe::do_recv (this=0x52fec800,
>>     buf=0xe27c000
>> "\235\302óH[\002\220R1\035\251\334,-(\353L\rJ\247\232i\210._j\001\375\357\351U\211⤸}\322}8\250\\\374\325W\020f\237\037\314\300Ty\251-\307%\277\ni\205\213\031\245\024\314҂j\223\025\207\212Z`4\340E5!\016\006\227\064\320h\334*\224\204\061\037r\017ʍ\325\f/\202G\257\065&j%\"\222\027\064\204\322n\244\315K\220\354.i\255\206\004z\320M&j\\\200\256x␚|\343\r\237_\347Q\023I\261\216\315Y\202M\361\217Q\301\252\224\370$\330\370\354\334T\261\226\263E%!\245pC\262)\r%\031\244\061\262(u*{\325B\n\222\017QW*+\230\262\067\216\335~\224"...,
>> len=4096, flags=64) at msg/simple/Pipe.cc:2430
>> #3  0x0000000000dc8990 in Pipe::buffered_recv (this=0x52fec800,
>> buf=0x4d29e3e1 "", len=3823, flags=Unhandled dwarf expression opcode
>> 0xf3
>> ) at msg/simple/Pipe.cc:2476
>> #4  0x0000000000dc8a90 in Pipe::tcp_read_nonblocking (this=0x52fec800,
>> buf=Unhandled dwarf expression opcode 0xf3
>> ) at msg/simple/Pipe.cc:2494
>> #5  0x0000000000dc9b36 in Pipe::read_message (this=0x52fec800,
>> pm=0x7fce7a9d8b98, auth_handler=0x5d22ef60) at msg/simple/Pipe.cc:2034
>> #6  0x0000000000ddbcc7 in Pipe::reader (this=0x52fec800) at
>> msg/simple/Pipe.cc:1581
>> #7  0x0000000000dded6d in Pipe::Reader::entry (this=Unhandled dwarf
>> expression opcode 0xf3
>> ) at msg/simple/Pipe.h:50
>> #8  0x00000032a2607aa1 in start_thread () from /lib64/libpthread.so.0
>> #9  0x00000032a22e893d in clone () from /lib64/libc.so.6
>> (gdb) frame 5
>> #5  0x0000000000dc9b36 in Pipe::read_message (this=0x52fec800,
>> pm=0x7fce7a9d8b98, auth_handler=0x5d22ef60) at msg/simple/Pipe.cc:2034
>> 2034          int got = tcp_read_nonblocking(bp.c_str(), read);
>> (gdb) p connection_state
>> $3 = {px = 0x49c38b00}
>> (gdb) p &((PipeConnection*)0x49c38b00)->lock
>> $5 = (Mutex *) 0x49c38b18
>> (gdb) p this->port
>> $11 = 63805
>> (gdb) p this->peer_type
>> $12 = 4
>> (gdb) printf "%d.%d.%d.%d\n",
>> this->peer_addr.addr4.sin_addr.s_addr&0xFF,
>> this->peer_addr.addr4.sin_addr.s_addr>>8&0xFF ,
>> this->peer_addr.addr4.sin_addr.s_addr>>16&0xFF,
>> this->peer_addr.addr4.sin_addr.s_addr>>24&0xFF
>> 192.168.30.105
>> (gdb) printf "%d\n", (this->peer_addr.addr4.sin_port<<8&0xFF00) |
>> (this->peer_addr.addr4.sin_port>>8&0x00FF)
>> 6805
>>
>>
>> [root@192.168.30.41 ~]# lsof -i :63805
>> COMMAND    PID USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
>> ceph-osd 29007 root  647u  IPv4 150861496      0t0  TCP
>> 192.168.30.41:63805->192.168.30.105:6805 (ESTABLISHED)
>>
>> [root@192.168.30.105 ~]# netstat -nap | grep 63805
>> [root@192.168.30.105 ~]#
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Thank you!
> HuangJun
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux