I am using PJSIP 1.1 to write an IVR application. The application would dial a list of outgoing number, play a pre-recorded wav file and ask the user to press a DTMF to confirm. The application works fine for small number of concurrent calls (<5), but with a high number of concurrent calls (10 and above), there is a deadlock after a certain number of calls (the exact number could vary but is around 500 on average). In all deadlock situations, the behaviour causing the deadlock is always the same. Strangely, repeating the behaviour on its own does not replicate the problem. The steps leading to the deadlock is as follows: 1. The program dials a number, the phone rings and is picked up by the remote user. The application's on_call_state fires and the call is in the 'CONFIRMED' state, pjsua_call_info.state = PJSIP_INV_STATE_CONFIRMED. 2. My program begins to transmit the WAV file to the above call which was picked up. 3. Shortly after the WAV playback is done. another on_call_state event takes places. The state of the call is PJSIP_INV_STATE_EARLY (Session Progress), which indicates that the remote phone is ringing. 4. A deadlock happens at this stage, and shortly afterwards, any PJSIP command will fail with "Timed-out trying to acquire PJSUA mutex". Some analysis using Visual Studio reveals that the deadlock is caused by the following 3 threads *****Thread 1 (conference.c)****** PJ_DEF(pj_status_t) pjmedia_conf_remove_port( pjmedia_conf *conf, unsigned port ) { ......... pj_mutex_lock(conf->mutex); ........ } ****Calling stack for Thread 1**** pj_mutex_lock(pj_mutex_t * mutex=0x04a692ac) Line 911 + 0xc bytes C pjmedia_conf_remove_port(pjmedia_conf * conf=0x033229fc, unsigned int port=3) Line 1069 + 0xc bytes C pjsua_conf_remove_port(int id=3) Line 1631 + 0x10 bytes C stop_media_session(int call_id=72) Line 1230 + 0xc bytes C pjsua_media_channel_deinit(int call_id=72) Line 1271 + 0x9 bytes C pjsua_call_on_state_changed(pjsip_inv_session * inv=0x05cd3e28, pjsip_event * e=0x05a1058c) Line 2970 + 0xb bytes C inv_set_state(pjsip_inv_session * inv=0x05cd3e28, pjsip_inv_state state=PJSIP_INV_STATE_DISCONNECTED, pjsip_event * e=0x05a1058c) Line 219 + 0x10 bytes C handle_uac_call_rejection(pjsip_inv_session * inv=0x05cd3e28, pjsip_event * e=0x05a1058c) Line 3020 + 0xf bytes C inv_on_state_early(pjsip_inv_session * inv=0x05cd3e28, pjsip_event * e=0x05a1058c) Line 3256 + 0xd bytes C mod_inv_on_tsx_state(pjsip_transaction * tsx=0x05e270cc, pjsip_event * e=0x05a1058c) Line 535 + 0x19 bytes C pjsip_dlg_on_tsx_state(pjsip_dialog * dlg=0x05cd377c, pjsip_transaction * tsx=0x05e270cc, pjsip_event * e=0x05a1058c) Line 1907 + 0x1c bytes C mod_ua_on_tsx_state(pjsip_transaction * tsx=0x05e270cc, pjsip_event * e=0x05a1058c) Line 178 + 0x11 bytes C tsx_set_state(pjsip_transaction * tsx=0x05e270cc, pjsip_tsx_state_e state=PJSIP_TSX_STATE_COMPLETED, pjsip_event_id_e event_src_type=PJSIP_EVENT_RX_MSG, void * event_src=0x04a6c3ec) Line 1095 + 0x15 bytes C tsx_on_state_proceeding_uac(pjsip_transaction * tsx=0x05e270cc, pjsip_event * event=0x05a107f8) Line 2642 + 0x14 bytes C pjsip_tsx_recv_msg(pjsip_transaction * tsx=0x05e270cc, pjsip_rx_data * rdata=0x04a6c3ec) Line 1570 + 0x12 bytes C mod_tsx_layer_on_rx_response(pjsip_rx_data * rdata=0x04a6c3ec) Line 817 + 0xd bytes C endpt_on_rx_msg(pjsip_endpoint * endpt=0x03327f1c, int status=0, pjsip_rx_data * rdata=0x04a6c3ec) Line 927 + 0x11 bytes C pjsip_tpmgr_receive_packet(pjsip_tpmgr * mgr=0x048dc2a8, pjsip_rx_data * rdata=0x04a6c3ec) Line 1457 + 0x17 bytes C udp_on_read_complete(pj_ioqueue_key_t * key=0x047aed88, pj_ioqueue_op_key_t * op_key=0x04a6c3f8, long bytes_read=482) Line 166 + 0x16 bytes C ioqueue_dispatch_read_event(pj_ioqueue_t * ioqueue=0x047a007c, pj_ioqueue_key_t * h=0x047aed88) Line 552 + 0x16 bytes C pj_ioqueue_poll(pj_ioqueue_t * ioqueue=0x047a007c, const pj_time_val * timeout=0x05a1fccc) Line 764 + 0x17 bytes C pjsip_endpt_handle_events2(pjsip_endpoint * endpt=0x03327f1c, const pj_time_val * max_timeout=0x05a1fdc8, unsigned int * p_count=0x05a1fdd8) Line 719 + 0x10 bytes C pjsua_handle_events(unsigned int msec_timeout=10) Line 1217 + 0x14 bytes C worker_thread(void * arg=0x00000000) Line 519 + 0x7 bytes C thread_main(void * param=0x04a68aa4) Line 414 + 0x11 bytes C *****Thread 2 (stream.c)****** static pj_status_t get_frame( pjmedia_port *port, pjmedia_frame *frame) { ....... /* Lock jitter buffer mutex first */ pj_mutex_lock( stream->jb_mutex ); ....... } ****Calling stack for Thread 2**** pj_mutex_lock(pj_mutex_t * mutex=0x05ccf534) Line 911 + 0xc bytes C get_frame(pjmedia_port * port=0x05ccefec, pjmedia_frame * frame=0x05b1f96c) Line 222 + 0xf bytes C pjmedia_port_get_frame(pjmedia_port * port=0x05ccefec, pjmedia_frame * frame=0x05b1f96c) Line 68 + 0x12 bytes C read_port(pjmedia_conf * conf=0x033229fc, conf_port * cport=0x05cc9f80, short * frame=0x04a6acf8, unsigned int count=320, pjmedia_frame_type * type=0x05b1fb20) Line 1363 + 0x10 bytes C get_frame(pjmedia_port * this_port=0x04a5becc, pjmedia_frame * frame=0x05b1fd80) Line 1771 + 0x22 bytes C pjmedia_port_get_frame(pjmedia_port * port=0x04a5becc, pjmedia_frame * frame=0x05b1fd80) Line 68 + 0x12 bytes C clock_callback(const pj_timestamp * ts=0x04a6afdc, void * user_data=0x04a6acdc) Line 181 + 0x10 bytes C clock_thread(void * arg=0x04a6afc4) Line 255 + 0x18 bytes C thread_main(void * param=0x04a6b00c) Line 414 + 0x11 bytes C --------- It seems that thread 1 is waiting on a lock which is locked by thread 2 (conf->mutex->OwningThread points to thread 2's ID) and thread 2 is waiting on a lock which is locked by thread 3 (stream->jb_mutex->OwningThread points to thread 3's ID). Thread 3 details is as follows: *****Thread 3 (jbuf.c)****** jb_framelist_put_at(jb_framelist_t * framelist=0x05ccf574, unsigned int index=15447332, const void * frame=0x05baccb4, unsigned int frame_size=80, unsigned int bit_info=3435973836) Line 241 + 0x1c bytes C pjmedia_jbuf_put_frame2(pjmedia_jbuf * jb=0x05ccf56c, const void * frame=0x05baccb4, unsigned int frame_size=80, unsigned int bit_info=3435973836, int frame_seq=15447332, int * discarded=0x05910b2c) Line 546 + 0x1c bytes C on_rx_rtp(void * data=0x05ccefe4, void * pkt=0x05bacca8, long bytes_read=172) Line 1404 + 0x47 bytes C srtp_rtp_cb(void * user_data=0x05cd2704, void * pkt=0x05bacca8, long size=172) Line 809 + 0x1f bytes C on_rx_rtp(pj_ioqueue_key_t * key=0x048f8674, pj_ioqueue_op_key_t * op_key=0x05bab268, long bytes_read=172) Line 471 + 0x17 bytes C ioqueue_dispatch_read_event(pj_ioqueue_t * ioqueue=0x048df834, pj_ioqueue_key_t * h=0x048f8674) Line 552 + 0x16 bytes C pj_ioqueue_poll(pj_ioqueue_t * ioqueue=0x048df834, const pj_time_val * timeout=0x0591feac) Line 764 + 0x17 bytes C worker_proc(void * arg=0x048dd91c) Line 269 + 0x13 bytes C thread_main(void * param=0x04a4d940) Line 414 + 0x11 bytes C Unlike thread 1 and 2 which are frozen waiting for the locks, thread 3 is still running and keeps on printing the following message: 19:06:42.156 ec041832F0 Underflow, buf_cnt=0, will generate 1 frame 19:06:42.156 Master/sound 257 samples reduced, buf_cnt=656 19:06:42.156 ec041832F0 Underflow, buf_cnt=0, will generate 1 frame 19:06:42.156 Master/sound 146 samples reduced, buf_cnt=670 19:06:42.156 ec041832F0 Underflow, buf_cnt=0, will generate 1 frame 19:06:42.156 Master/sound 185 samples reduced, buf_cnt=645 19:06:42.156 ec041832F0 Underflow, buf_cnt=0, will generate 1 frame 19:06:42.156 Master/sound 164 samples reduced, buf_cnt=641 19:06:42.156 ec041832F0 Underflow, buf_cnt=0, will generate 1 frame .......... Thread 3 never stops printing the message and never releases the lock (stream->jb_mutex). Therefore, a deadlock occurs between thread 1, 2 and thread 3. >From what I understand, thread 1 attempts to add a new conference port for the call which is in 'EARLY' stage. Thread 2 is responsible for maintaining the audio stream between the conference port. Thread 3 is responsible for sending/receiving socket data. I could not figured out the exact location where threads 3 acquire the lock, thus causing the deadlock later on. Any ideas on how I can fix this problem? Is this a bug with PJSIP, or has it got anything to do with my code? Any advice is appreciated. Thanks in advance. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.pjsip.org/pipermail/pjsip_lists.pjsip.org/attachments/20090614/5262a942/attachment-0001.html>