NOTE: I typed "This deadlock is almost impossible to fix" - That should read "This deadlock is almost impossible to reproduce" :) Steve On Mon, 4 Jan 2016 at 12:55 Steve Davies <davies147 at gmail.com> wrote: > This deadlock is almost impossible to fix, so can be considered > theoretical if you'd like, but I have a test rig which can cause what I am > describing. > > I combined Asterisk-11 latest branch with pjproject-2.4.5 to cause this, > and my current workaround is to revert to locking behaviour similar to > pjproject-2.1 just to get me going again. > > Attached it a fairly nasty patch that does this. The reasoning is hard to > explain, but in the file: > pjnath/src/pjnath/ice_session.c > in > on_timer() > The 'ice->grp_lock' is acquired at the start and released at the end, such > that if the 'ice->cb.on_ice_complete' callback is called, then the lock is > held while this call completes. This makes perfect sense because the 'ice' > data is being passed into that callback, but... > > What can happen though is in a threaded application (Asterisk), this can > result in a reversed locking order, and a deadlock - The following is a > horribly simplified demonstration of 2 threads deadlocking: > > Thread1: > - Call into ice_session.c:on_timer() > - Locks ice->grp_lock > - Callback to ice->cb.on_ice_complete() > - application tries to lock it's own APP_LOCK > > Thread2: > - non PJ application code... > - Locks it's own APP_LOCK > - Call into ice_session.c:pj_ice_sess_send_data() > - Tries to lock ice->grp_lock > > As I mentioned, this is VERY hard to demonstrate, but can be made to > happen. I am sure that the attached patch is not the best solution, but is > relatively harmless AFAIK to keep me going. > > As always, feedback and comments requested. > > Regards, > Steve > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.pjsip.org/pipermail/pjsip_lists.pjsip.org/attachments/20160104/67751a3f/attachment.html>