Hi Alex,
As promised, attached is the patch which should fix the issue. Please let us know your test result and if you have any feedback. Thanks.
Best regards,
Ming
On Tue, Mar 7, 2017 at 7:37 PM, Alex <alex-lists@xxxxxxxxx> wrote:
Hi,
I've experienced a deadlock between the dlg lock and the tsx grp_lock. The
scenario involves PJSIP in a re-INVITE transaction (dialog is already
established), where PJSIP acts as the UAS. The INVITE has been received,
the 200 OK has been sent. Now the ACK is delayed so much that the ACK is
received at the same time the timeout timer fires.
Now, from what i deduced, the following happened:
Thread 13 receives the delayed ACK from the net:
In pjsip_dlg_on_rx_request a dlg lock is taken: pjsip_dlg_inc_lock(dlg),
afterwards it calls pjsip_tsx_terminate, which tries to take
pj_grp_lock_acquire(tsx->grp_lock)
Thread 25 handles the timeout:
In pjsip_tsx_terminate, it takes pj_grp_lock_acquire(tsx->grp_lock)
Afterwards, tsx_set_state calls (*tsx->tsx_user->on_tsx_state)(tsx, &e),
which is pjsip_dlg_on_tsx_state. That in turn tries to take pjsip_dlg_inc_lock(dlg);
Both callpaths lock the same lock in different order.
Below the backtraces of both threads. I've verified via the mutex owner field
that these threads are indeed waiting on each other. This is with pjsip 2.5.5.
Note that i reduced timerd to 2 sec in this scenario. This caused the 200/ACK
to timeout pretty early (another bug, as this is the wrong timer for it),
and is probably the reason why i saw this happening, as this increased the
chance of it happening.
(gdb) info th
Id Target Id Frame
46 Thread 0x7f47501f1700 (LWP 22987) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
45 Thread 0x7f4750175700 (LWP 22988) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
44 Thread 0x7f47500f9700 (LWP 22989) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
43 Thread 0x7f474fd3b700 (LWP 22991) "asterisk" 0x00007f474ca24aed in poll () from /lib/x86_64-linux-gnu/libc.so.6
42 Thread 0x7f474b1b0700 (LWP 22994) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
41 Thread 0x7f474b03c700 (LWP 22997) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
40 Thread 0x7f474afc0700 (LWP 22998) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
39 Thread 0x7f474af44700 (LWP 22999) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
38 Thread 0x7f474aec8700 (LWP 23000) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
37 Thread 0x7f474ae4c700 (LWP 23001) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
36 Thread 0x7f474add0700 (LWP 23002) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/ libpthread.so.0
35 Thread 0x7f474ad54700 (LWP 23003) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
34 Thread 0x7f474acd8700 (LWP 23004) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
33 Thread 0x7f474ac5c700 (LWP 23005) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
32 Thread 0x7f474abe0700 (LWP 23006) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
31 Thread 0x7f474ab64700 (LWP 23007) "asterisk" 0x00007f474ca24aed in poll () from /lib/x86_64-linux-gnu/libc.so.6
30 Thread 0x7f474aae8700 (LWP 23008) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
29 Thread 0x7f474aa6c700 (LWP 23009) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
28 Thread 0x7f474a9f0700 (LWP 23010) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
27 Thread 0x7f47381fb700 (LWP 23011) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
26 Thread 0x7f473817f700 (LWP 23012) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
25 Thread 0x7f46cf18e700 (LWP 23013) "asterisk" 0x00007f474d9847bc in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
24 Thread 0x7f46ce721700 (LWP 23021) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/ libpthread.so.0
23 Thread 0x7f46ce6a5700 (LWP 23022) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/ libpthread.so.0
22 Thread 0x7f46ce629700 (LWP 23023) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
21 Thread 0x7f4695d9c700 (LWP 23026) "asterisk" 0x00007f474c9fef2d in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
20 Thread 0x7f46bc1fb700 (LWP 23027) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/ libpthread.so.0
19 Thread 0x7f46bc17f700 (LWP 23028) "asterisk" 0x00007f474c9fef2d in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
18 Thread 0x7f4694d9a700 (LWP 23036) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
17 Thread 0x7f465802f700 (LWP 18621) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/ libpthread.so.0
16 Thread 0x7f4663d17700 (LWP 21339) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/ libpthread.so.0
15 Thread 0x7f465da47700 (LWP 22460) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/ libpthread.so.0
14 Thread 0x7f46cc0d0700 (LWP 3421) "asterisk" 0x00007f474ca24aed in poll () from /lib/x86_64-linux-gnu/libc.so.6
13 Thread 0x7f466155f700 (LWP 8596) "asterisk" 0x00007f474d9847bc in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
12 Thread 0x7f465e3f7700 (LWP 3454) "asterisk" 0x00007f474ca24aed in poll () from /lib/x86_64-linux-gnu/libc.so.6
11 Thread 0x7f466a55b700 (LWP 27491) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
10 Thread 0x7f4695047700 (LWP 27939) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
9 Thread 0x7f46635d3700 (LWP 28127) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
8 Thread 0x7f474fcbf700 (LWP 28159) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
7 Thread 0x7f46689bf700 (LWP 7445) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/ libpthread.so.0
6 Thread 0x7f46950c3700 (LWP 7446) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/ libpthread.so.0
5 Thread 0x7f46bc103700 (LWP 7447) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/ libpthread.so.0
4 Thread 0x7f4668563700 (LWP 7448) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/ libpthread.so.0
3 Thread 0x7f466a8bf700 (LWP 7449) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/ libpthread.so.0
2 Thread 0x7f46ce819700 (LWP 7453) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/ libpthread.so.0
* 1 Thread 0x7f47501f37c0 (LWP 22965) "asterisk" 0x00007f474ca24aed in poll () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) thread 13
[Switching to thread 13 (Thread 0x7f466155f700 (LWP 8596))]
#0 0x00007f474d9847bc in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt full
#0 0x00007f474d9847bc in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1 0x00007f474d980494 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2 0x00007f474d9802f6 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3 0x00007f4748aa6605 in pj_mutex_lock (mutex=0x7f466d3eae68) at ../src/pj/os_core_unix.c:1265
status = 1832824232
#4 0x00007f4748aac561 in grp_lock_acquire (p=0x7f466d3ead58) at ../src/pj/lock.c:290
glock = 0x7f466d3ead58
lck = 0x7f466de06708
#5 0x00007f4748aac5a2 in pj_grp_lock_acquire (grp_lock=<optimized out>) at ../src/pj/lock.c:478
No locals.
#6 0x00007f4749be20cd in pjsip_tsx_terminate (tsx=0x7f466ee29bb8, code=408) at ../src/pjsip/sip_transaction.c:1636
code = 408
tsx = 0x7f466ee29bb8
#7 0x00007f474a021dc6 in mod_inv_on_rx_request (rdata=0x7f46c4074c38) at ../src/pjsip-ua/sip_inv.c:586
method = <optimized out>
dlg = <optimized out>
inv = 0x7f466cf687c8
#8 0x00007f4749be4b01 in pjsip_dlg_on_rx_request (dlg=dlg@entry=0x7f466eacb138, rdata=rdata@entry=0x7f46c4074c38) at ../src/pjsip/sip_dialog.c:1694
status = <optimized out>
tsx = 0x0
processed = 0
i = 1
#9 0x00007f4749be6147 in mod_ua_on_rx_request (rdata=0x7f46c4074c38) at ../src/pjsip/sip_ua_layer.c:699
dlg = 0x7f466eacb138
status = <optimized out>
from_tag = 0x7f46c4076778
#10 0x00007f4749bccf77 in pjsip_endpt_process_rx_data (endpt=<optimized out>, rdata=rdata@entry=0x7f46c4074c38, p=p@entry=0x7f46d0004c40 <param>, p_handled=p_handled@entry= 0x7f466155ecf4) at ../src/pjsip/sip_endpoint.c: 887
msg = 0x7f46c4075f30
def_prm = {start_prio = 1633023744, start_mod = 0x148aa6880, idx_after_start = 0, silent = 0}
i = <optimized out>
status = <optimized out>
mod = 0x7f4749dfb900 <mod_ua>
handled = 0
p_handled = 0x7f466155ecf4
p = 0x7f46d0004c40 <param>
rdata = 0x7f46c4074c38
endpt = <optimized out>
#11 0x00007f46cfde0d0c in distribute (data="" at res_pjsip/pjsip_distributor.c:769
param = {start_prio = 0, start_mod = 0x7f46d0004d80 <distributor_mod>, idx_after_start = 1, silent = 0}
handled = 0
rdata = 0x7f46c4074c38
is_request = <optimized out>
is_ack = 1
endpoint = <optimized out>
#12 0x00007f474ff5d718 in ast_taskprocessor_execute (tps=tps@entry=0x7f4750b0c730) at taskprocessor.c:967
local = {local_data = 0x7f474ffbc900, data = ""> t = 0x7f46c40d6130
__PRETTY_FUNCTION__ = "ast_taskprocessor_execute"
#13 0x00007f474ff64e20 in execute_tasks (data="" at threadpool.c:1322
tps = 0x7f4750b0c730
#14 0x00007f474ff5d718 in ast_taskprocessor_execute (tps=0x7f47504bfb20) at taskprocessor.c:967
local = {local_data = 0x7f47504bf730, data = "" <__ao2_unlock+28>}
t = 0x7f46c40b31f0
__PRETTY_FUNCTION__ = "ast_taskprocessor_execute"
#15 0x00007f474ff64824 in threadpool_execute (pool=<optimized out>) at threadpool.c:351
No locals.
#16 worker_active (worker=<optimized out>) at threadpool.c:1105
No locals.
#17 worker_start (arg=0x7f46c80015c0) at threadpool.c:1024
__PRETTY_FUNCTION__ = "worker_start"
#18 0x00007f474ff6d0fc in dummy_start (data="" out>) at utils.c:1235
__cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {139941979924512, 7710576958850181645, 0, 139942986697232, 23, 139940257462016, -7670269739218523635, -7669657610719745523}, __mask_was_saved = 0}}, __pad = {0x7f466155eeb0, 0x0,
0x7f474cce6878, 0x7f474cce6880}}
__cancel_arg = 0x7f466155f700
__not_first_call = <optimized out>
ret = <optimized out>
a = {start_routine = 0x7f474ff64480 <worker_start>, data = "" name = <optimized out>}
#19 0x00007f474d97e064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#20 0x00007f474ca2d62d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
(gdb) thread 25
[Switching to thread 25 (Thread 0x7f46cf18e700 (LWP 23013))]
#0 0x00007f474d9847bc in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt full
#0 0x00007f474d9847bc in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1 0x00007f474d980494 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2 0x00007f474d9802f6 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3 0x00007f4748aa6605 in pj_mutex_lock (mutex=0x7f466eacb630) at ../src/pj/os_core_unix.c:1265
status = 1856812048
#4 0x00007f4749be3b57 in pjsip_dlg_inc_lock (dlg=0x7f466eacb138) at ../src/pjsip/sip_dialog.c:885
No locals.
#5 0x00007f4749be4dd6 in pjsip_dlg_on_tsx_state (dlg=0x7f466eacb138, tsx=0x7f466ee29bb8, e=0x7f46cf18dc10) at ../src/pjsip/sip_dialog.c:2047
i = <optimized out>
#6 0x00007f4749bdf5e1 in tsx_set_state (tsx=tsx@entry=0x7f466ee29bb8, state=state@entry=PJSIP_TSX_STATE_TERMINATED, event_src_type=event_src_type@ entry=PJSIP_EVENT_TIMER, event_src=event_src@entry= 0x7f466ee29d70, flag=flag@entry=0)
at ../src/pjsip/sip_transaction.c:1233
e = {prev = 0x0, next = 0x7f4750697470, type = PJSIP_EVENT_TSX_STATE, body = {timer = {entry = 0x7f466ee29d70}, tsx_state = {src = "" = 0x7f466ee29d70, tdata = 0x7f466ee29d70, timer = 0x7f466ee29d70, status = 1860345200,
data = "" tsx = 0x7f466ee29bb8, prev_state = 4, type = PJSIP_EVENT_TIMER}, tx_msg = {tdata = 0x7f466ee29d70}, tx_error = {tdata = 0x7f466ee29d70, tsx = 0x7f466ee29bb8}, rx_msg = {rdata = 0x7f466ee29d70}, user = {
user1 = 0x7f466ee29d70, user2 = 0x7f466ee29bb8, user3 = 0x100000004, user4 = 0xffffffffffffea2b}}}
prev_state = PJSIP_TSX_STATE_COMPLETED
#7 0x00007f4749be06b0 in tsx_on_state_completed_uas (tsx=0x7f466ee29bb8, event=0x7f46cf18dce0) at ../src/pjsip/sip_transaction.c:3198
No locals.
#8 0x00007f4749bdfa86 in tsx_timer_callback (theap=<optimized out>, entry=0x7f466ee29d70) at ../src/pjsip/sip_transaction.c:1171
event = {prev = 0x7f47504be918, next = 0x7f474db8e2e0 <__pthread_keys>, type = PJSIP_EVENT_TIMER, body = {timer = {entry = 0x7f466ee29d70}, tsx_state = {src = "" = 0x7f466ee29d70, tdata = 0x7f466ee29d70, timer = 0x7f466ee29d70,
status = 1860345200, data = "" tsx = 0x7f47504b8a01, prev_state = 1219132260, type = 32583}, tx_msg = {tdata = 0x7f466ee29d70}, tx_error = {tdata = 0x7f466ee29d70, tsx = 0x7f47504b8a01}, rx_msg = {
rdata = 0x7f466ee29d70}, user = {user1 = 0x7f466ee29d70, user2 = 0x7f47504b8a01, user3 = 0x7f4748aa7b64 <elapsed_msec+36>, user4 = 0x3b9aca00}}}
tsx = 0x7f466ee29bb8
#9 0x00007f4748ab7207 in pj_timer_heap_poll (ht=0x7f47504b8d80, next_delay=next_delay@entry=0x7f46cf18de00) at ../src/pj/timer.c:643
node = 0x7f466ee29d70
grp_lock = 0x7f466d3ead58
now = {sec = 11064205, msec = 106}
count = 1
#10 0x00007f4749bccc94 in pjsip_endpt_handle_events2 (endpt=0x7f47504b8a98, max_timeout=max_timeout@entry=0x7f46cf18de50, p_count=p_count@entry=0x0) at ../src/pjsip/sip_endpoint.c: 713
timeout = {sec = 0, msec = 0}
count = 0
net_event_count = 0
c = <optimized out>
#11 0x00007f4749bccd97 in pjsip_endpt_handle_events (endpt=<optimized out>, max_timeout=max_timeout@entry=0x7f46cf18de50) at ../src/pjsip/sip_endpoint.c: 770
No locals.
#12 0x00007f46cfdd2be8 in monitor_thread_exec (endpt=<optimized out>) at res_pjsip.c:3995
delay = {sec = 0, msec = 10}
#13 0x00007f4748aa6900 in thread_main (param=0x7f47504be918) at ../src/pj/os_core_unix.c:541
rec = 0x7f47504be918
result = <optimized out>
#14 0x00007f474d97e064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#15 0x00007f474ca2d62d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
--
Alex
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists. pjsip.org
Index: pjsip/src/pjsip/sip_transaction.c =================================================================== --- pjsip/src/pjsip/sip_transaction.c (revision 5564) +++ pjsip/src/pjsip/sip_transaction.c (working copy) @@ -1230,7 +1230,23 @@ pjsip_event e; PJSIP_EVENT_INIT_TSX_STATE(e, tsx, event_src_type, event_src, prev_state); + + /* For timer event, release lock to avoid deadlock. + * Refer to ticket #2001 (https://trac.pjsip.org/repos/ticket/2001). + */ + if (event_src_type == PJSIP_EVENT_TIMER && + (pj_timer_entry *)event_src == &tsx->timeout_timer) + { + pj_grp_lock_release(tsx->grp_lock); + } + (*tsx->tsx_user->on_tsx_state)(tsx, &e); + + if (event_src_type == PJSIP_EVENT_TIMER && + (pj_timer_entry *)event_src == &tsx->timeout_timer) + { + pj_grp_lock_acquire(tsx->grp_lock); + } }
_______________________________________________ Visit our blog: http://blog.pjsip.org pjsip mailing list pjsip@xxxxxxxxxxxxxxx http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org