Hi David,
Please find the attached combined patch.
Regards,
Ming
On Thu, Mar 23, 2017 at 10:13 PM, David Richards <david.brian.richards@xxxxxxxxx> wrote:
Hi Ming.The patch is working great so far. I'll be doing more testing. Can you send me a single patch file that contains Alex's deadlock patch too (and any other deadlock fixes you might have) so i can test them all together?Thanks,Dave RichardsOn Tue, Mar 21, 2017 at 5:08 AM, Ming <ming@xxxxxxxxx> wrote:Hi David,Ah, right, I completely forgot that application can obtain PJSUA lock in the callback, so even though the library doesn't technically cause the deadlock, it's still not a practical solution.Your initial analysis seems to be correct, it's just that last time, I didn't find a solution yet to prevent the transaction from getting destroyed (as previously reported in ticket #1706 (https://trac.pjsip.org/repos/ticket/1706 ), but now I think I do (in the attached patch).So, thanks for testing it before and please help us test the patch again.Best regards,MingOn Tue, Mar 21, 2017 at 4:05 AM, David Richards <david.brian.richards@xxxxxxxxm > wrote:Hi Ming.Thanks for the quick response. However your patch doesn't fix the problem, it just moved it as shown in the Thread 1 stack backtrace below. Your patch stopped the PJSUA lock from being obtained in pjsua_call_on_state_changed()(frame 7), however, just a few frames later (frame 5), my application calls pjsua_call_on_state_changed() from on_call_state() where an attempt on the PJSUA lock is made and deadlocked.I didn't include the stack back traces of the other tasks because they were essentially blocked the same as before. I admit I don't know this software well, but why was my initial analysis incorrect? Don't the locks always have to be locked in the same order?Thanks for your help,DaveThread 1 (Thread 0x7f1a3935d840 (LWP 27501)):#0 in __lll_lock_wait () from /lib64/libpthread.so.0#1 in _L_lock_839 () from /lib64/libpthread.so.0#2 in pthread_mutex_lock () from /lib64/libpthread.so.0#3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265#4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575 #5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at ../src/pjsua-lib/pjsua_call.c:1817 #6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509#7 in pjsua_call_on_state_changed (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at ../src/pjsua-lib/pjsua_call.c:3824 #8 in inv_set_state (inv=0x7f1a05272688, state=PJSIP_INV_STATE_DISCONNECTED, e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:317 #9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:4310#10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:717#11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064#12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at ../src/pjsip/sip_ua_layer.c:178 #13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED, event_src_type=PJSIP_EVENT_TX_ MSG, event_src=0x7f1a043c0c38, flag=0) at ../src/pjsip/sip_transaction.c :1235 #14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608, event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819 #15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at ../src/pjsip/sip_transaction.c:1751 #16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529#17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at ../src/pjsip-ua/sip_inv.c:3227#18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0, msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426 #19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90 <dialer_info+3972080>) at TERMmain.c:313#20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at ../src/pj/timer.c:643#21 in TIMERpoll () at TIMERmain.c:72#22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352#23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1, argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952#24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782On Mon, Mar 20, 2017 at 5:33 AM, Ming <ming@xxxxxxxxx> wrote:Hi all,Thanks for the report and sorry for the delay in answering.For the problem reported by David, we think the problem is actually caused by thread 1 (instead of thread 3) which shouldn't try to obtain PJSUA's lock after holding the transaction lock. So we propose the fix attached. Please revert your temporary fix (where you remove the acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our patch instead, and share with us whether it rectifies the issue. Thanks.While for the deadlock issue reported by Alex, despite the similarity, I believe this is an unrelated issue, so I'll reply it separately in the original thread.Finally, for Janu, without the stack trace, I can't be certain that your problem is the same as one or both of these issues. So my suggestion is to apply both patches and see if the problem disappears. Also, I would recommend to upgrade to version 2.6, if you haven't, since 2.3 is already quite old.Best regards,Ming______________________________Also seems like the one I reprted (happens also if the application calls pjsua_call_hangup() between receiving a sip BYE and sending the corresponding OK).I also got no response to that.Regards,Thomas------ Originalnachricht------Von: Alex HermannDatum: Fr., 17. März 2017 09:49Cc:Betreff:Re: Interesting deadlock bug found causing three threads to deadlock on the PJSUA, UA, and transaction locks.On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghttp: //lists.pjsip.org/mailman/list info/pjsip_lists.pjsip.org _________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.or g
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.or g
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.or g
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip. org
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists. pjsip.org
Index: pjsip/include/pjsip/sip_transaction.h =================================================================== --- pjsip/include/pjsip/sip_transaction.h (revision 5564) +++ pjsip/include/pjsip/sip_transaction.h (working copy) @@ -179,6 +179,10 @@ * Find a transaction with the specified key. The transaction key normally * is created by calling #pjsip_tsx_create_key() from an incoming message. * + * IMPORTANT: To prevent deadlock, application should use + * #pjsip_tsx_layer_find_tsx2() instead which only adds a reference to + * the transaction instead of locking it. + * * @param key The key string to find the transaction. * @param lock If non-zero, transaction will be locked before the * function returns, to make sure that it's not deleted @@ -191,6 +195,21 @@ pj_bool_t lock ); /** + * Find a transaction with the specified key. The transaction key normally + * is created by calling #pjsip_tsx_create_key() from an incoming message. + * + * @param key The key string to find the transaction. + * @param add_ref If non-zero, transaction's reference will be added + * by one before the function returns, to make sure that + * it's not deleted by other threads. + * + * @return The matching transaction instance, or NULL if transaction + * can not be found. + */ +PJ_DECL(pjsip_transaction*) pjsip_tsx_layer_find_tsx2( const pj_str_t *key, + pj_bool_t add_ref ); + +/** * Create, initialize, and register a new transaction as UAC from the * specified transmit data (\c tdata). The transmit data must have a valid * \c Request-Line and \c CSeq header. Index: pjsip/src/pjsip/sip_transaction.c =================================================================== --- pjsip/src/pjsip/sip_transaction.c (revision 5564) +++ pjsip/src/pjsip/sip_transaction.c (working copy) @@ -641,8 +641,8 @@ /* * Find a transaction. */ -PJ_DEF(pjsip_transaction*) pjsip_tsx_layer_find_tsx( const pj_str_t *key, - pj_bool_t lock ) +static pjsip_transaction* find_tsx( const pj_str_t *key, pj_bool_t lock, + pj_bool_t add_ref ) { pjsip_transaction *tsx; pj_uint32_t hval = 0; @@ -654,7 +654,7 @@ /* Prevent the transaction to get deleted before we have chance to lock it. */ - if (tsx && lock) + if (tsx) pj_grp_lock_add_ref(tsx->grp_lock); pj_mutex_unlock(mod_tsx_layer.mutex); @@ -666,9 +666,12 @@ /* Simulate race condition! */ PJ_RACE_ME(5); - if (tsx && lock) { - pj_grp_lock_acquire(tsx->grp_lock); - pj_grp_lock_dec_ref(tsx->grp_lock); + if (tsx) { + if (lock) + pj_grp_lock_acquire(tsx->grp_lock); + + if (!add_ref) + pj_grp_lock_dec_ref(tsx->grp_lock); } return tsx; @@ -675,6 +678,20 @@ } +PJ_DEF(pjsip_transaction*) pjsip_tsx_layer_find_tsx( const pj_str_t *key, + pj_bool_t lock ) +{ + return find_tsx(key, lock, PJ_FALSE); +} + + +PJ_DEF(pjsip_transaction*) pjsip_tsx_layer_find_tsx2( const pj_str_t *key, + pj_bool_t add_ref ) +{ + return find_tsx(key, PJ_FALSE, add_ref); +} + + /* This module callback is called when module is being loaded by * endpoint. It does nothing for this module. */ @@ -1230,7 +1247,29 @@ pjsip_event e; PJSIP_EVENT_INIT_TSX_STATE(e, tsx, event_src_type, event_src, prev_state); + + /* For timer event, release lock to avoid deadlock. + * This should be safe because: + * 1. The tsx state just switches to TERMINATED or DESTROYED. + * 2. There should be no other processing taking place. All other + * events, such as the ones handled by tsx_on_state_terminated() + * should be ignored. + * 3. tsx_shutdown() hasn't been called. + * Refer to ticket #2001 (https://trac.pjsip.org/repos/ticket/2001). + */ + if (event_src_type == PJSIP_EVENT_TIMER && + (pj_timer_entry *)event_src == &tsx->timeout_timer) + { + pj_grp_lock_release(tsx->grp_lock); + } + (*tsx->tsx_user->on_tsx_state)(tsx, &e); + + if (event_src_type == PJSIP_EVENT_TIMER && + (pj_timer_entry *)event_src == &tsx->timeout_timer) + { + pj_grp_lock_acquire(tsx->grp_lock); + } } Index: pjsip/src/pjsip/sip_ua_layer.c =================================================================== --- pjsip/src/pjsip/sip_ua_layer.c (revision 5564) +++ pjsip/src/pjsip/sip_ua_layer.c (working copy) @@ -551,12 +551,12 @@ pjsip_get_invite_method(), rdata); /* Lookup the INVITE transaction */ - tsx = pjsip_tsx_layer_find_tsx(&key, PJ_TRUE); + tsx = pjsip_tsx_layer_find_tsx2(&key, PJ_TRUE); /* We should find the dialog attached to the INVITE transaction */ if (tsx) { dlg = (pjsip_dialog*) tsx->mod_data[mod_ua.mod.id]; - pj_grp_lock_release(tsx->grp_lock); + pj_grp_lock_dec_ref(tsx->grp_lock); /* Dlg may be NULL on some extreme condition * (e.g. during debugging where initially there is a dialog) Index: pjsip/src/pjsip-ua/sip_inv.c =================================================================== --- pjsip/src/pjsip-ua/sip_inv.c (revision 5564) +++ pjsip/src/pjsip-ua/sip_inv.c (working copy) @@ -3275,7 +3275,7 @@ pjsip_tsx_create_key(rdata->tp_info.pool, &key, PJSIP_ROLE_UAS, pjsip_get_invite_method(), rdata); - invite_tsx = pjsip_tsx_layer_find_tsx(&key, PJ_TRUE); + invite_tsx = pjsip_tsx_layer_find_tsx2(&key, PJ_TRUE); if (invite_tsx == NULL) { @@ -3324,7 +3324,7 @@ } if (invite_tsx) - pj_grp_lock_release(invite_tsx->grp_lock); + pj_grp_lock_dec_ref(invite_tsx->grp_lock); }
_______________________________________________ Visit our blog: http://blog.pjsip.org pjsip mailing list pjsip@xxxxxxxxxxxxxxx http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org