Hi David,
Thanks for all your help in testing the patches.
@Ross: Yes, we just committed the patches in ticket #2001 (https://trac.pjsip.org/repos/ticket/2001 ) and #2002 (https://trac.pjsip.org/repos/ticket/2002 ).
Best regards,
Ming
On Tue, Mar 28, 2017 at 10:32 PM, David Richards <david.brian.richards@xxxxxxxxx> wrote:
Hi Ming.I tested your deadlock patch over the weekend and it worked great with no problems!Thanks again.,DaveOn Thu, Mar 23, 2017 at 7:26 PM, Ming <ming@xxxxxxxxx> wrote:Hi David,Please find the attached combined patch.Regards,MingOn Thu, Mar 23, 2017 at 10:13 PM, David Richards <david.brian.richards@xxxxxxxxm > wrote:Hi Ming.The patch is working great so far. I'll be doing more testing. Can you send me a single patch file that contains Alex's deadlock patch too (and any other deadlock fixes you might have) so i can test them all together?Thanks,Dave RichardsOn Tue, Mar 21, 2017 at 5:08 AM, Ming <ming@xxxxxxxxx> wrote:Hi David,Ah, right, I completely forgot that application can obtain PJSUA lock in the callback, so even though the library doesn't technically cause the deadlock, it's still not a practical solution.Your initial analysis seems to be correct, it's just that last time, I didn't find a solution yet to prevent the transaction from getting destroyed (as previously reported in ticket #1706 (https://trac.pjsip.org/repos/ticket/1706 ), but now I think I do (in the attached patch).So, thanks for testing it before and please help us test the patch again.Best regards,MingOn Tue, Mar 21, 2017 at 4:05 AM, David Richards <david.brian.richards@xxxxxxxxm > wrote:Hi Ming.Thanks for the quick response. However your patch doesn't fix the problem, it just moved it as shown in the Thread 1 stack backtrace below. Your patch stopped the PJSUA lock from being obtained in pjsua_call_on_state_changed()(frame 7), however, just a few frames later (frame 5), my application calls pjsua_call_on_state_changed() from on_call_state() where an attempt on the PJSUA lock is made and deadlocked.I didn't include the stack back traces of the other tasks because they were essentially blocked the same as before. I admit I don't know this software well, but why was my initial analysis incorrect? Don't the locks always have to be locked in the same order?Thanks for your help,DaveThread 1 (Thread 0x7f1a3935d840 (LWP 27501)):#0 in __lll_lock_wait () from /lib64/libpthread.so.0#1 in _L_lock_839 () from /lib64/libpthread.so.0#2 in pthread_mutex_lock () from /lib64/libpthread.so.0#3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265#4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575 #5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at ../src/pjsua-lib/pjsua_call.c:1817 #6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509#7 in pjsua_call_on_state_changed (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at ../src/pjsua-lib/pjsua_call.c:3824 #8 in inv_set_state (inv=0x7f1a05272688, state=PJSIP_INV_STATE_DISCONNECTED, e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:317 #9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:4310#10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:717#11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064#12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at ../src/pjsip/sip_ua_layer.c:178 #13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED, event_src_type=PJSIP_EVENT_TX_ MSG, event_src=0x7f1a043c0c38, flag=0) at ../src/pjsip/sip_transaction.c :1235 #14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608, event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819 #15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at ../src/pjsip/sip_transaction.c:1751 #16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529#17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at ../src/pjsip-ua/sip_inv.c:3227#18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0, msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426 #19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90 <dialer_info+3972080>) at TERMmain.c:313#20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at ../src/pj/timer.c:643#21 in TIMERpoll () at TIMERmain.c:72#22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352#23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1, argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952#24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782On Mon, Mar 20, 2017 at 5:33 AM, Ming <ming@xxxxxxxxx> wrote:Hi all,Thanks for the report and sorry for the delay in answering.For the problem reported by David, we think the problem is actually caused by thread 1 (instead of thread 3) which shouldn't try to obtain PJSUA's lock after holding the transaction lock. So we propose the fix attached. Please revert your temporary fix (where you remove the acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our patch instead, and share with us whether it rectifies the issue. Thanks.While for the deadlock issue reported by Alex, despite the similarity, I believe this is an unrelated issue, so I'll reply it separately in the original thread.Finally, for Janu, without the stack trace, I can't be certain that your problem is the same as one or both of these issues. So my suggestion is to apply both patches and see if the problem disappears. Also, I would recommend to upgrade to version 2.6, if you haven't, since 2.3 is already quite old.Best regards,Ming______________________________Also seems like the one I reprted (happens also if the application calls pjsua_call_hangup() between receiving a sip BYE and sending the corresponding OK).I also got no response to that.Regards,Thomas------ Originalnachricht------Von: Alex HermannDatum: Fr., 17. März 2017 09:49Cc:Betreff:Re: Interesting deadlock bug found causing three threads to deadlock on the PJSUA, UA, and transaction locks.On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghttp: //lists.pjsip.org/mailman/list info/pjsip_lists.pjsip.org _________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.or g
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.or g
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.or g
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.or g
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.or g
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip. org
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists. pjsip.org
_______________________________________________ Visit our blog: http://blog.pjsip.org pjsip mailing list pjsip@xxxxxxxxxxxxxxx http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org