Re: Interesting deadlock bug found causing three threads to deadlock on the PJSUA, UA, and transaction locks.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ming,


Will this be committed to SVN?


Kind regards,


Ross




From: pjsip <pjsip-bounces@xxxxxxxxxxxxxxx> on behalf of Ming <ming@xxxxxxxxx>
Sent: 24 March 2017 00:26
To: pjsip list
Subject: Re: Interesting deadlock bug found causing three threads to deadlock on the PJSUA, UA, and transaction locks.
 
Hi David,

Please find the attached combined patch.

Regards,
Ming

On Thu, Mar 23, 2017 at 10:13 PM, David Richards <david.brian.richards@xxxxxxxxx> wrote:
Hi Ming.

The patch is working great so far.  I'll be doing more testing. Can you send me a single patch file that contains Alex's deadlock patch too (and any other deadlock fixes you might have) so i can test them all together?

Thanks,
Dave Richards

On Tue, Mar 21, 2017 at 5:08 AM, Ming <ming@xxxxxxxxx> wrote:
Hi David,

Ah, right, I completely forgot that application can obtain PJSUA lock in the callback, so even though the library doesn't technically cause the deadlock, it's still not a practical solution.

Your initial analysis seems to be correct, it's just that last time, I didn't find a solution yet to prevent the transaction from getting destroyed (as previously reported in ticket #1706 (https://trac.pjsip.org/repos/ticket/1706), but now I think I do (in the attached patch).

So, thanks for testing it before and please help us test the patch again.

Best regards,
Ming

On Tue, Mar 21, 2017 at 4:05 AM, David Richards <david.brian.richards@xxxxxxxxm> wrote:
Hi Ming.

Thanks for the quick response. However your patch doesn't fix the problem, it just moved it as shown in the Thread 1 stack backtrace below.  Your patch stopped the PJSUA lock from being obtained in pjsua_call_on_state_changed()(frame 7), however, just a few frames later (frame 5), my application calls pjsua_call_on_state_changed() from on_call_state() where an attempt on the PJSUA lock is made and deadlocked.

I didn't include the stack back traces of the other tasks because they were essentially blocked the same as before.  I admit I don't know this software well, but why was my initial analysis incorrect?  Don't the locks always have to be locked in the same order?

Thanks for your help,
Dave

Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
#0  in __lll_lock_wait () from /lib64/libpthread.so.0
#1  in _L_lock_839 () from /lib64/libpthread.so.0
#2  in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
#4  in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
#5  in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at ../src/pjsua-lib/pjsua_call.c:1817
#6  in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
#7  in pjsua_call_on_state_changed (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at ../src/pjsua-lib/pjsua_call.c:3824
#8  in inv_set_state (inv=0x7f1a05272688, state=PJSIP_INV_STATE_DISCONNECTED, e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:317
#9  in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:4310
#10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:717
#11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
#12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at ../src/pjsip/sip_ua_layer.c:178
#13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED, event_src_type=PJSIP_EVENT_TX_MSG, event_src=0x7f1a043c0c38, flag=0) at ../src/pjsip/sip_transaction.c:1235
#14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608, event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
#15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at ../src/pjsip/sip_transaction.c:1751
#16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
#17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at ../src/pjsip-ua/sip_inv.c:3227
#18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0, msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426
#19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90 <dialer_info+3972080>) at TERMmain.c:313
#20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at ../src/pj/timer.c:643
#21 in TIMERpoll () at TIMERmain.c:72
#22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
#23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1, argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
#24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782


On Mon, Mar 20, 2017 at 5:33 AM, Ming <ming@xxxxxxxxx> wrote:
Hi all,

Thanks for the report and sorry for the delay in answering.

For the problem reported by David, we think the problem is actually caused by thread 1 (instead of thread 3) which shouldn't try to obtain PJSUA's lock after holding the transaction lock. So we propose the fix attached. Please revert your temporary fix (where you remove the acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our patch instead, and share with us whether it rectifies the issue. Thanks.

While for the deadlock issue reported by Alex, despite the similarity, I believe this is an unrelated issue, so I'll reply it separately in the original thread.

Finally, for Janu, without the stack trace, I can't be certain that your problem is the same as one or both of these issues. So my suggestion is to apply both patches and see if the problem disappears. Also, I would recommend to upgrade to version 2.6, if you haven't, since 2.3 is already quite old.

Best regards,
Ming

On Fri, Mar 17, 2017 at 4:58 PM, janu@xxxxxxxxxxx <janu@xxxxxxxxxxx> wrote:
Also seems like the one I reprted (happens also if the application calls pjsua_call_hangup() between receiving a sip BYE and sending the corresponding OK).
I also got no response to that.

Regards,
Thomas

------ Originalnachricht------
Von: Alex Hermann
Datum: Fr., 17. März 2017 09:49
Cc:
Betreff:Re: Interesting deadlock bug found causing three threads to deadlock on the PJSUA, UA, and transaction locks.

On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghttp://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org



_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org



_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org



_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org



_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org


_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

[Index of Archives]     [Asterisk Users]     [Asterisk App Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [Linux API]
  Powered by Linux