[PJSIP 1.1] deadlock on high call volume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yea I think I might have been there done that on this issue.  Question:
when you do a play/record/conference add/remove do you do it from a 
pjsip callback function?
If so I think that is happening is if the conference bring is 
processing a frame while the callback happens
the conf mutex is locked which causes the pjsua_ function to lock the 
pjsua mutex and then block on the
conf mutex.

Solution: in the callback record data, like ANI, DNIS, conf_slot, 
etc.  But do the pjsua_ calls in a seperate
thread.

David Clark



At 10:02 AM 6/14/2009, Nguyen Van Minh Danh wrote:
>I am using PJSIP 1.1 to write an IVR application. The application 
>would dial a list of outgoing number, play a pre-recorded wav file 
>and ask the user to press a DTMF to confirm. The application works 
>fine for small number of concurrent calls (<5), but with a high 
>number of concurrent calls (10 and above), there is a deadlock after 
>a certain number of calls (the exact number could vary but is around 
>500 on average).
>
>In all deadlock situations, the behaviour causing the deadlock is 
>always the same. Strangely, repeating the behaviour on its own does 
>not replicate the problem. The steps leading to the deadlock is as follows:
>
>1. The program dials a number, the phone rings and is picked up by 
>the remote user. The application's on_call_state fires and the call 
>is in the 'CONFIRMED' state, pjsua_call_info.state = PJSIP_INV_STATE_CONFIRMED.
>2. My program begins to transmit the WAV file to the above call 
>which was picked up.
>3. Shortly after the WAV playback is done. another on_call_state 
>event takes places. The state of the call is PJSIP_INV_STATE_EARLY 
>(Session Progress), which indicates that the remote phone is ringing.
>4. A deadlock happens at this stage, and shortly afterwards, any 
>PJSIP command will fail with "Timed-out trying to acquire PJSUA mutex".
>
>Some analysis using Visual Studio reveals that the deadlock is 
>caused by the following 3 threads
>
>*****Thread 1 (conference.c)******
>
>PJ_DEF(pj_status_t) pjmedia_conf_remove_port( pjmedia_conf *conf,
>            unsigned port )
>{
>.........
>     pj_mutex_lock(conf->mutex);
>........
>}
>
>****Calling stack for Thread 1****
>
>  pj_mutex_lock(pj_mutex_t * mutex=0x04a692ac)  Line 911 + 0xc bytes C
>   pjmedia_conf_remove_port(pjmedia_conf * conf=0x033229fc, unsigned 
> int port=3)  Line 1069 + 0xc bytes C
>   pjsua_conf_remove_port(int id=3)  Line 1631 + 0x10 bytes C
>   stop_media_session(int call_id=72)  Line 1230 + 0xc bytes C
>   pjsua_media_channel_deinit(int call_id=72)  Line 1271 + 0x9 bytes C
>   pjsua_call_on_state_changed(pjsip_inv_session * inv=0x05cd3e28, 
> pjsip_event * e=0x05a1058c)  Line 2970 + 0xb bytes C
>   inv_set_state(pjsip_inv_session * inv=0x05cd3e28, pjsip_inv_state 
> state=PJSIP_INV_STATE_DISCONNECTED, pjsip_event * 
> e=0x05a1058c)  Line 219 + 0x10 bytes C
>   handle_uac_call_rejection(pjsip_inv_session * inv=0x05cd3e28, 
> pjsip_event * e=0x05a1058c)  Line 3020 + 0xf bytes C
>   inv_on_state_early(pjsip_inv_session * inv=0x05cd3e28, 
> pjsip_event * e=0x05a1058c)  Line 3256 + 0xd bytes C
>   mod_inv_on_tsx_state(pjsip_transaction * tsx=0x05e270cc, 
> pjsip_event * e=0x05a1058c)  Line 535 + 0x19 bytes C
>   pjsip_dlg_on_tsx_state(pjsip_dialog * dlg=0x05cd377c, 
> pjsip_transaction * tsx=0x05e270cc, pjsip_event * 
> e=0x05a1058c)  Line 1907 + 0x1c bytes C
>   mod_ua_on_tsx_state(pjsip_transaction * tsx=0x05e270cc, 
> pjsip_event * e=0x05a1058c)  Line 178 + 0x11 bytes C
>   tsx_set_state(pjsip_transaction * tsx=0x05e270cc, 
> pjsip_tsx_state_e state=PJSIP_TSX_STATE_COMPLETED, pjsip_event_id_e 
> event_src_type=PJSIP_EVENT_RX_MSG, void * 
> event_src=0x04a6c3ec)  Line 1095 + 0x15 bytes C
>   tsx_on_state_proceeding_uac(pjsip_transaction * tsx=0x05e270cc, 
> pjsip_event * event=0x05a107f8)  Line 2642 + 0x14 bytes C
>   pjsip_tsx_recv_msg(pjsip_transaction * tsx=0x05e270cc, 
> pjsip_rx_data * rdata=0x04a6c3ec)  Line 1570 + 0x12 bytes C
>   mod_tsx_layer_on_rx_response(pjsip_rx_data * 
> rdata=0x04a6c3ec)  Line 817 + 0xd bytes C
>   endpt_on_rx_msg(pjsip_endpoint * endpt=0x03327f1c, int status=0, 
> pjsip_rx_data * rdata=0x04a6c3ec)  Line 927 + 0x11 bytes C
>   pjsip_tpmgr_receive_packet(pjsip_tpmgr * mgr=0x048dc2a8, 
> pjsip_rx_data * rdata=0x04a6c3ec)  Line 1457 + 0x17 bytes C
>   udp_on_read_complete(pj_ioqueue_key_t * key=0x047aed88, 
> pj_ioqueue_op_key_t * op_key=0x04a6c3f8, long bytes_read=482)  Line 
> 166 + 0x16 bytes C
>   ioqueue_dispatch_read_event(pj_ioqueue_t * ioqueue=0x047a007c, 
> pj_ioqueue_key_t * h=0x047aed88)  Line 552 + 0x16 bytes C
>   pj_ioqueue_poll(pj_ioqueue_t * ioqueue=0x047a007c, const 
> pj_time_val * timeout=0x05a1fccc)  Line 764 + 0x17 bytes C
>   pjsip_endpt_handle_events2(pjsip_endpoint * endpt=0x03327f1c, 
> const pj_time_val * max_timeout=0x05a1fdc8, unsigned int * 
> p_count=0x05a1fdd8)  Line 719 + 0x10 bytes C
>   pjsua_handle_events(unsigned int msec_timeout=10)  Line 1217 + 0x14 bytes C
>   worker_thread(void * arg=0x00000000)  Line 519 + 0x7 bytes C
>   thread_main(void * param=0x04a68aa4)  Line 414 + 0x11 bytes C
>
>*****Thread 2 (stream.c)******
>static pj_status_t get_frame( pjmedia_port *port, pjmedia_frame *frame)
>{
>.......
>/* Lock jitter buffer mutex first */
>pj_mutex_lock( stream->jb_mutex );
>.......
>}
>
>****Calling stack for Thread 2****
>
>   pj_mutex_lock(pj_mutex_t * mutex=0x05ccf534)  Line 911 + 0xc bytes C
>   get_frame(pjmedia_port * port=0x05ccefec, pjmedia_frame * 
> frame=0x05b1f96c)  Line 222 + 0xf bytes C
>   pjmedia_port_get_frame(pjmedia_port * port=0x05ccefec, 
> pjmedia_frame * frame=0x05b1f96c)  Line 68 + 0x12 bytes C
>   read_port(pjmedia_conf * conf=0x033229fc, conf_port * 
> cport=0x05cc9f80, short * frame=0x04a6acf8, unsigned int count=320, 
> pjmedia_frame_type * type=0x05b1fb20)  Line 1363 + 0x10 bytes C
>   get_frame(pjmedia_port * this_port=0x04a5becc, pjmedia_frame * 
> frame=0x05b1fd80)  Line 1771 + 0x22 bytes C
>   pjmedia_port_get_frame(pjmedia_port * port=0x04a5becc, 
> pjmedia_frame * frame=0x05b1fd80)  Line 68 + 0x12 bytes C
>   clock_callback(const pj_timestamp * ts=0x04a6afdc, void * 
> user_data=0x04a6acdc)  Line 181 + 0x10 bytes C
>   clock_thread(void * arg=0x04a6afc4)  Line 255 + 0x18 bytes C
>   thread_main(void * param=0x04a6b00c)  Line 414 + 0x11 bytes C
>
>---------
>
>It seems that thread 1 is waiting on a lock which is locked by 
>thread 2 (conf->mutex->OwningThread points to thread 2's ID) and 
>thread 2 is waiting on a lock which is locked by thread 3 
>(stream->jb_mutex->OwningThread points to thread 3's ID). Thread 3 
>details is as follows:
>
>*****Thread 3 (jbuf.c)******
>
>   jb_framelist_put_at(jb_framelist_t * framelist=0x05ccf574, 
> unsigned int index=15447332, const void * frame=0x05baccb4, 
> unsigned int frame_size=80, unsigned int bit_info=3435973836)  Line 
> 241 + 0x1c bytes C
>   pjmedia_jbuf_put_frame2(pjmedia_jbuf * jb=0x05ccf56c, const void 
> * frame=0x05baccb4, unsigned int frame_size=80, unsigned int 
> bit_info=3435973836, int frame_seq=15447332, int * 
> discarded=0x05910b2c)  Line 546 + 0x1c bytes C
>   on_rx_rtp(void * data=0x05ccefe4, void * pkt=0x05bacca8, long 
> bytes_read=172)  Line 1404 + 0x47 bytes C
>   srtp_rtp_cb(void * user_data=0x05cd2704, void * pkt=0x05bacca8, 
> long size=172)  Line 809 + 0x1f bytes C
>   on_rx_rtp(pj_ioqueue_key_t * key=0x048f8674, pj_ioqueue_op_key_t 
> * op_key=0x05bab268, long bytes_read=172)  Line 471 + 0x17 bytes C
>   ioqueue_dispatch_read_event(pj_ioqueue_t * ioqueue=0x048df834, 
> pj_ioqueue_key_t * h=0x048f8674)  Line 552 + 0x16 bytes C
>   pj_ioqueue_poll(pj_ioqueue_t * ioqueue=0x048df834, const 
> pj_time_val * timeout=0x0591feac)  Line 764 + 0x17 bytes C
>   worker_proc(void * arg=0x048dd91c)  Line 269 + 0x13 bytes C
>   thread_main(void * param=0x04a4d940)  Line 414 + 0x11 bytes C
>Unlike thread 1 and 2 which are frozen waiting for the locks, thread 
>3 is still running and keeps on printing the following message:
>
>  19:06:42.156     ec041832F0  Underflow, buf_cnt=0, will generate 1 frame
>  19:06:42.156   Master/sound  257 samples reduced, buf_cnt=656
>  19:06:42.156     ec041832F0  Underflow, buf_cnt=0, will generate 1 frame
>  19:06:42.156   Master/sound  146 samples reduced, buf_cnt=670
>  19:06:42.156     ec041832F0  Underflow, buf_cnt=0, will generate 1 frame
>  19:06:42.156   Master/sound  185 samples reduced, buf_cnt=645
>  19:06:42.156     ec041832F0  Underflow, buf_cnt=0, will generate 1 frame
>  19:06:42.156   Master/sound  164 samples reduced, buf_cnt=641
>  19:06:42.156     ec041832F0  Underflow, buf_cnt=0, will generate 1 frame
>..........
>
>Thread 3 never stops printing the message and never releases the 
>lock (stream->jb_mutex). Therefore, a deadlock occurs between thread 
>1, 2 and thread 3.
>
> From what I understand, thread 1 attempts to add a new conference 
> port for the call which is in 'EARLY' stage. Thread 2 is 
> responsible for maintaining the audio stream between the conference 
> port. Thread 3 is responsible for sending/receiving socket data. I 
> could not figured out the exact location where threads 3 acquire 
> the lock, thus causing the deadlock later on.
>
>Any ideas on how I can fix this problem? Is this a bug with PJSIP, 
>or has it got anything to do with my code?
>
>Any advice is appreciated. Thanks in advance.
>
>
>_______________________________________________
>Visit our blog: http://blog.pjsip.org
>
>pjsip mailing list
>pjsip at lists.pjsip.org
>http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pjsip.org/pipermail/pjsip_lists.pjsip.org/attachments/20090615/e8829713/attachment.html>


[Index of Archives]     [Asterisk Users]     [Asterisk App Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [Linux API]
  Powered by Linux