Deadlock between dlg and tsx

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've experienced a deadlock between the dlg lock and the tsx grp_lock. The
scenario involves PJSIP in a re-INVITE transaction (dialog is already
established), where PJSIP acts as the UAS. The INVITE has been received,
the 200 OK has been sent. Now the ACK is delayed so much that the ACK is
received at the same time the timeout timer fires.

Now, from what i deduced, the following happened:

Thread 13 receives the delayed ACK from the net:
In pjsip_dlg_on_rx_request a dlg lock is taken: pjsip_dlg_inc_lock(dlg),
afterwards it calls pjsip_tsx_terminate, which tries to take
pj_grp_lock_acquire(tsx->grp_lock)


Thread 25 handles the timeout:
In pjsip_tsx_terminate, it takes pj_grp_lock_acquire(tsx->grp_lock)
Afterwards, tsx_set_state calls (*tsx->tsx_user->on_tsx_state)(tsx, &e),
which is pjsip_dlg_on_tsx_state. That in turn tries to take pjsip_dlg_inc_lock(dlg);

Both callpaths lock the same lock in different order.

Below the backtraces of both threads. I've verified via the mutex owner field
that these threads are indeed waiting on each other. This is with pjsip 2.5.5.

Note that i reduced timerd to 2 sec in this scenario. This caused the 200/ACK
to timeout pretty early (another bug, as this is the wrong timer for it),
and is probably the reason why i saw this happening, as this increased the
chance of it happening.


(gdb) info th
  Id   Target Id         Frame 
  46   Thread 0x7f47501f1700 (LWP 22987) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  45   Thread 0x7f4750175700 (LWP 22988) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
  44   Thread 0x7f47500f9700 (LWP 22989) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
  43   Thread 0x7f474fd3b700 (LWP 22991) "asterisk" 0x00007f474ca24aed in poll () from /lib/x86_64-linux-gnu/libc.so.6
  42   Thread 0x7f474b1b0700 (LWP 22994) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  41   Thread 0x7f474b03c700 (LWP 22997) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
  40   Thread 0x7f474afc0700 (LWP 22998) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
  39   Thread 0x7f474af44700 (LWP 22999) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  38   Thread 0x7f474aec8700 (LWP 23000) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  37   Thread 0x7f474ae4c700 (LWP 23001) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
  36   Thread 0x7f474add0700 (LWP 23002) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  35   Thread 0x7f474ad54700 (LWP 23003) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
  34   Thread 0x7f474acd8700 (LWP 23004) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
  33   Thread 0x7f474ac5c700 (LWP 23005) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
  32   Thread 0x7f474abe0700 (LWP 23006) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
  31   Thread 0x7f474ab64700 (LWP 23007) "asterisk" 0x00007f474ca24aed in poll () from /lib/x86_64-linux-gnu/libc.so.6
  30   Thread 0x7f474aae8700 (LWP 23008) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
  29   Thread 0x7f474aa6c700 (LWP 23009) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  28   Thread 0x7f474a9f0700 (LWP 23010) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
  27   Thread 0x7f47381fb700 (LWP 23011) "asterisk" 0x00007f474d984010 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
  26   Thread 0x7f473817f700 (LWP 23012) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  25   Thread 0x7f46cf18e700 (LWP 23013) "asterisk" 0x00007f474d9847bc in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
  24   Thread 0x7f46ce721700 (LWP 23021) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  23   Thread 0x7f46ce6a5700 (LWP 23022) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  22   Thread 0x7f46ce629700 (LWP 23023) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  21   Thread 0x7f4695d9c700 (LWP 23026) "asterisk" 0x00007f474c9fef2d in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
  20   Thread 0x7f46bc1fb700 (LWP 23027) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  19   Thread 0x7f46bc17f700 (LWP 23028) "asterisk" 0x00007f474c9fef2d in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
  18   Thread 0x7f4694d9a700 (LWP 23036) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  17   Thread 0x7f465802f700 (LWP 18621) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  16   Thread 0x7f4663d17700 (LWP 21339) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  15   Thread 0x7f465da47700 (LWP 22460) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  14   Thread 0x7f46cc0d0700 (LWP 3421) "asterisk" 0x00007f474ca24aed in poll () from /lib/x86_64-linux-gnu/libc.so.6
  13   Thread 0x7f466155f700 (LWP 8596) "asterisk" 0x00007f474d9847bc in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
  12   Thread 0x7f465e3f7700 (LWP 3454) "asterisk" 0x00007f474ca24aed in poll () from /lib/x86_64-linux-gnu/libc.so.6
  11   Thread 0x7f466a55b700 (LWP 27491) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  10   Thread 0x7f4695047700 (LWP 27939) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  9    Thread 0x7f46635d3700 (LWP 28127) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  8    Thread 0x7f474fcbf700 (LWP 28159) "asterisk" 0x00007f474d98204f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  7    Thread 0x7f46689bf700 (LWP 7445) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  6    Thread 0x7f46950c3700 (LWP 7446) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  5    Thread 0x7f46bc103700 (LWP 7447) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  4    Thread 0x7f4668563700 (LWP 7448) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  3    Thread 0x7f466a8bf700 (LWP 7449) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  2    Thread 0x7f46ce819700 (LWP 7453) "asterisk" 0x00007f474d9823f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
* 1    Thread 0x7f47501f37c0 (LWP 22965) "asterisk" 0x00007f474ca24aed in poll () from /lib/x86_64-linux-gnu/libc.so.6



(gdb) thread 13
[Switching to thread 13 (Thread 0x7f466155f700 (LWP 8596))]
#0  0x00007f474d9847bc in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt full
#0  0x00007f474d9847bc in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x00007f474d980494 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x00007f474d9802f6 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x00007f4748aa6605 in pj_mutex_lock (mutex=0x7f466d3eae68) at ../src/pj/os_core_unix.c:1265
        status = 1832824232
#4  0x00007f4748aac561 in grp_lock_acquire (p=0x7f466d3ead58) at ../src/pj/lock.c:290
        glock = 0x7f466d3ead58
        lck = 0x7f466de06708
#5  0x00007f4748aac5a2 in pj_grp_lock_acquire (grp_lock=<optimized out>) at ../src/pj/lock.c:478
No locals.
#6  0x00007f4749be20cd in pjsip_tsx_terminate (tsx=0x7f466ee29bb8, code=408) at ../src/pjsip/sip_transaction.c:1636
        code = 408
        tsx = 0x7f466ee29bb8
#7  0x00007f474a021dc6 in mod_inv_on_rx_request (rdata=0x7f46c4074c38) at ../src/pjsip-ua/sip_inv.c:586
        method = <optimized out>
        dlg = <optimized out>
        inv = 0x7f466cf687c8
#8  0x00007f4749be4b01 in pjsip_dlg_on_rx_request (dlg=dlg@entry=0x7f466eacb138, rdata=rdata@entry=0x7f46c4074c38) at ../src/pjsip/sip_dialog.c:1694
        status = <optimized out>
        tsx = 0x0
        processed = 0
        i = 1
#9  0x00007f4749be6147 in mod_ua_on_rx_request (rdata=0x7f46c4074c38) at ../src/pjsip/sip_ua_layer.c:699
        dlg = 0x7f466eacb138
        status = <optimized out>
        from_tag = 0x7f46c4076778
#10 0x00007f4749bccf77 in pjsip_endpt_process_rx_data (endpt=<optimized out>, rdata=rdata@entry=0x7f46c4074c38, p=p@entry=0x7f46d0004c40 <param>, p_handled=p_handled@entry=0x7f466155ecf4) at ../src/pjsip/sip_endpoint.c:887
        msg = 0x7f46c4075f30
        def_prm = {start_prio = 1633023744, start_mod = 0x148aa6880, idx_after_start = 0, silent = 0}
        i = <optimized out>
        status = <optimized out>
        mod = 0x7f4749dfb900 <mod_ua>
        handled = 0
        p_handled = 0x7f466155ecf4
        p = 0x7f46d0004c40 <param>
        rdata = 0x7f46c4074c38
        endpt = <optimized out>
#11 0x00007f46cfde0d0c in distribute (data=0x7f46c4074c38) at res_pjsip/pjsip_distributor.c:769
        param = {start_prio = 0, start_mod = 0x7f46d0004d80 <distributor_mod>, idx_after_start = 1, silent = 0}
        handled = 0
        rdata = 0x7f46c4074c38
        is_request = <optimized out>
        is_ack = 1
        endpoint = <optimized out>
#12 0x00007f474ff5d718 in ast_taskprocessor_execute (tps=tps@entry=0x7f4750b0c730) at taskprocessor.c:967
        local = {local_data = 0x7f474ffbc900, data = 0xbfb6f6e317de8400}
        t = 0x7f46c40d6130
        __PRETTY_FUNCTION__ = "ast_taskprocessor_execute"
#13 0x00007f474ff64e20 in execute_tasks (data=0x7f4750b0c730) at threadpool.c:1322
        tps = 0x7f4750b0c730
#14 0x00007f474ff5d718 in ast_taskprocessor_execute (tps=0x7f47504bfb20) at taskprocessor.c:967
        local = {local_data = 0x7f47504bf730, data = 0x7f474fe1e88c <__ao2_unlock+28>}
        t = 0x7f46c40b31f0
        __PRETTY_FUNCTION__ = "ast_taskprocessor_execute"
#15 0x00007f474ff64824 in threadpool_execute (pool=<optimized out>) at threadpool.c:351
No locals.
#16 worker_active (worker=<optimized out>) at threadpool.c:1105
No locals.
#17 worker_start (arg=0x7f46c80015c0) at threadpool.c:1024
        __PRETTY_FUNCTION__ = "worker_start"
#18 0x00007f474ff6d0fc in dummy_start (data=<optimized out>) at utils.c:1235
        __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {139941979924512, 7710576958850181645, 0, 139942986697232, 23, 139940257462016, -7670269739218523635, -7669657610719745523}, __mask_was_saved = 0}}, __pad = {0x7f466155eeb0, 0x0, 
            0x7f474cce6878, 0x7f474cce6880}}
        __cancel_arg = 0x7f466155f700
        __not_first_call = <optimized out>
        ret = <optimized out>
        a = {start_routine = 0x7f474ff64480 <worker_start>, data = 0x7f46c80015c0, name = <optimized out>}
#19 0x00007f474d97e064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#20 0x00007f474ca2d62d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.




(gdb) thread 25
[Switching to thread 25 (Thread 0x7f46cf18e700 (LWP 23013))]
#0  0x00007f474d9847bc in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt full
#0  0x00007f474d9847bc in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x00007f474d980494 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x00007f474d9802f6 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x00007f4748aa6605 in pj_mutex_lock (mutex=0x7f466eacb630) at ../src/pj/os_core_unix.c:1265
        status = 1856812048
#4  0x00007f4749be3b57 in pjsip_dlg_inc_lock (dlg=0x7f466eacb138) at ../src/pjsip/sip_dialog.c:885
No locals.
#5  0x00007f4749be4dd6 in pjsip_dlg_on_tsx_state (dlg=0x7f466eacb138, tsx=0x7f466ee29bb8, e=0x7f46cf18dc10) at ../src/pjsip/sip_dialog.c:2047
        i = <optimized out>
#6  0x00007f4749bdf5e1 in tsx_set_state (tsx=tsx@entry=0x7f466ee29bb8, state=state@entry=PJSIP_TSX_STATE_TERMINATED, event_src_type=event_src_type@entry=PJSIP_EVENT_TIMER, event_src=event_src@entry=0x7f466ee29d70, flag=flag@entry=0)
    at ../src/pjsip/sip_transaction.c:1233
        e = {prev = 0x0, next = 0x7f4750697470, type = PJSIP_EVENT_TSX_STATE, body = {timer = {entry = 0x7f466ee29d70}, tsx_state = {src = {rdata = 0x7f466ee29d70, tdata = 0x7f466ee29d70, timer = 0x7f466ee29d70, status = 1860345200, 
                data = 0x7f466ee29d70}, tsx = 0x7f466ee29bb8, prev_state = 4, type = PJSIP_EVENT_TIMER}, tx_msg = {tdata = 0x7f466ee29d70}, tx_error = {tdata = 0x7f466ee29d70, tsx = 0x7f466ee29bb8}, rx_msg = {rdata = 0x7f466ee29d70}, user = {
              user1 = 0x7f466ee29d70, user2 = 0x7f466ee29bb8, user3 = 0x100000004, user4 = 0xffffffffffffea2b}}}
        prev_state = PJSIP_TSX_STATE_COMPLETED
#7  0x00007f4749be06b0 in tsx_on_state_completed_uas (tsx=0x7f466ee29bb8, event=0x7f46cf18dce0) at ../src/pjsip/sip_transaction.c:3198
No locals.
#8  0x00007f4749bdfa86 in tsx_timer_callback (theap=<optimized out>, entry=0x7f466ee29d70) at ../src/pjsip/sip_transaction.c:1171
        event = {prev = 0x7f47504be918, next = 0x7f474db8e2e0 <__pthread_keys>, type = PJSIP_EVENT_TIMER, body = {timer = {entry = 0x7f466ee29d70}, tsx_state = {src = {rdata = 0x7f466ee29d70, tdata = 0x7f466ee29d70, timer = 0x7f466ee29d70, 
                status = 1860345200, data = 0x7f466ee29d70}, tsx = 0x7f47504b8a01, prev_state = 1219132260, type = 32583}, tx_msg = {tdata = 0x7f466ee29d70}, tx_error = {tdata = 0x7f466ee29d70, tsx = 0x7f47504b8a01}, rx_msg = {
              rdata = 0x7f466ee29d70}, user = {user1 = 0x7f466ee29d70, user2 = 0x7f47504b8a01, user3 = 0x7f4748aa7b64 <elapsed_msec+36>, user4 = 0x3b9aca00}}}
        tsx = 0x7f466ee29bb8
#9  0x00007f4748ab7207 in pj_timer_heap_poll (ht=0x7f47504b8d80, next_delay=next_delay@entry=0x7f46cf18de00) at ../src/pj/timer.c:643
        node = 0x7f466ee29d70
        grp_lock = 0x7f466d3ead58
        now = {sec = 11064205, msec = 106}
        count = 1
#10 0x00007f4749bccc94 in pjsip_endpt_handle_events2 (endpt=0x7f47504b8a98, max_timeout=max_timeout@entry=0x7f46cf18de50, p_count=p_count@entry=0x0) at ../src/pjsip/sip_endpoint.c:713
        timeout = {sec = 0, msec = 0}
        count = 0
        net_event_count = 0
        c = <optimized out>
#11 0x00007f4749bccd97 in pjsip_endpt_handle_events (endpt=<optimized out>, max_timeout=max_timeout@entry=0x7f46cf18de50) at ../src/pjsip/sip_endpoint.c:770
No locals.
#12 0x00007f46cfdd2be8 in monitor_thread_exec (endpt=<optimized out>) at res_pjsip.c:3995
        delay = {sec = 0, msec = 10}
#13 0x00007f4748aa6900 in thread_main (param=0x7f47504be918) at ../src/pj/os_core_unix.c:541
        rec = 0x7f47504be918
        result = <optimized out>
#14 0x00007f474d97e064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#15 0x00007f474ca2d62d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

-- 
Alex

_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org



[Index of Archives]     [Asterisk Users]     [Asterisk App Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [Linux API]
  Powered by Linux