[PJSIP 1.1] deadlock on high call volume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am using PJSIP 1.1 to write an IVR application. The application would dial
a list of outgoing number, play a pre-recorded wav file and ask the user to
press a DTMF to confirm. The application works fine for small number of
concurrent calls (<5), but with a high number of concurrent calls (10 and
above), there is a deadlock after a certain number of calls (the exact
number could vary but is around 500 on average).
 
In all deadlock situations, the behaviour causing the deadlock is always the
same. Strangely, repeating the behaviour on its own does not replicate the
problem. The steps leading to the deadlock is as follows:
 
1. The program dials a number, the phone rings and is picked up by the
remote user. The application's on_call_state fires and the call is in the
'CONFIRMED' state, pjsua_call_info.state = PJSIP_INV_STATE_CONFIRMED.
2. My program begins to transmit the WAV file to the above call which was
picked up.
3. Shortly after the WAV playback is done. another on_call_state event takes
places. The state of the call is PJSIP_INV_STATE_EARLY (Session Progress),
which indicates that the remote phone is ringing.
4. A deadlock happens at this stage, and shortly afterwards, any PJSIP
command will fail with "Timed-out trying to acquire PJSUA mutex".
 
Some analysis using Visual Studio reveals that the deadlock is caused by the
following 3 threads
 
*****Thread 1 (conference.c)******
 
PJ_DEF(pj_status_t) pjmedia_conf_remove_port( pjmedia_conf *conf,
           unsigned port )
{
.........
    pj_mutex_lock(conf->mutex);
........
}
 
****Calling stack for Thread 1****
 
 pj_mutex_lock(pj_mutex_t * mutex=0x04a692ac)  Line 911 + 0xc bytes C
  pjmedia_conf_remove_port(pjmedia_conf * conf=0x033229fc, unsigned int
port=3)  Line 1069 + 0xc bytes C
  pjsua_conf_remove_port(int id=3)  Line 1631 + 0x10 bytes C
  stop_media_session(int call_id=72)  Line 1230 + 0xc bytes C
  pjsua_media_channel_deinit(int call_id=72)  Line 1271 + 0x9 bytes C
  pjsua_call_on_state_changed(pjsip_inv_session * inv=0x05cd3e28,
pjsip_event * e=0x05a1058c)  Line 2970 + 0xb bytes C
  inv_set_state(pjsip_inv_session * inv=0x05cd3e28, pjsip_inv_state
state=PJSIP_INV_STATE_DISCONNECTED, pjsip_event * e=0x05a1058c)  Line 219 +
0x10 bytes C
  handle_uac_call_rejection(pjsip_inv_session * inv=0x05cd3e28, pjsip_event
* e=0x05a1058c)  Line 3020 + 0xf bytes C
  inv_on_state_early(pjsip_inv_session * inv=0x05cd3e28, pjsip_event *
e=0x05a1058c)  Line 3256 + 0xd bytes C
  mod_inv_on_tsx_state(pjsip_transaction * tsx=0x05e270cc, pjsip_event *
e=0x05a1058c)  Line 535 + 0x19 bytes C
  pjsip_dlg_on_tsx_state(pjsip_dialog * dlg=0x05cd377c, pjsip_transaction *
tsx=0x05e270cc, pjsip_event * e=0x05a1058c)  Line 1907 + 0x1c bytes C
  mod_ua_on_tsx_state(pjsip_transaction * tsx=0x05e270cc, pjsip_event *
e=0x05a1058c)  Line 178 + 0x11 bytes C
  tsx_set_state(pjsip_transaction * tsx=0x05e270cc, pjsip_tsx_state_e
state=PJSIP_TSX_STATE_COMPLETED, pjsip_event_id_e
event_src_type=PJSIP_EVENT_RX_MSG, void * event_src=0x04a6c3ec)  Line 1095 +
0x15 bytes C
  tsx_on_state_proceeding_uac(pjsip_transaction * tsx=0x05e270cc,
pjsip_event * event=0x05a107f8)  Line 2642 + 0x14 bytes C
  pjsip_tsx_recv_msg(pjsip_transaction * tsx=0x05e270cc, pjsip_rx_data *
rdata=0x04a6c3ec)  Line 1570 + 0x12 bytes C
  mod_tsx_layer_on_rx_response(pjsip_rx_data * rdata=0x04a6c3ec)  Line 817 +
0xd bytes C
  endpt_on_rx_msg(pjsip_endpoint * endpt=0x03327f1c, int status=0,
pjsip_rx_data * rdata=0x04a6c3ec)  Line 927 + 0x11 bytes C
  pjsip_tpmgr_receive_packet(pjsip_tpmgr * mgr=0x048dc2a8, pjsip_rx_data *
rdata=0x04a6c3ec)  Line 1457 + 0x17 bytes C
  udp_on_read_complete(pj_ioqueue_key_t * key=0x047aed88,
pj_ioqueue_op_key_t * op_key=0x04a6c3f8, long bytes_read=482)  Line 166 +
0x16 bytes C
  ioqueue_dispatch_read_event(pj_ioqueue_t * ioqueue=0x047a007c,
pj_ioqueue_key_t * h=0x047aed88)  Line 552 + 0x16 bytes C
  pj_ioqueue_poll(pj_ioqueue_t * ioqueue=0x047a007c, const pj_time_val *
timeout=0x05a1fccc)  Line 764 + 0x17 bytes C
  pjsip_endpt_handle_events2(pjsip_endpoint * endpt=0x03327f1c, const
pj_time_val * max_timeout=0x05a1fdc8, unsigned int * p_count=0x05a1fdd8)
Line 719 + 0x10 bytes C
  pjsua_handle_events(unsigned int msec_timeout=10)  Line 1217 + 0x14 bytes
C
  worker_thread(void * arg=0x00000000)  Line 519 + 0x7 bytes C
  thread_main(void * param=0x04a68aa4)  Line 414 + 0x11 bytes C
 
*****Thread 2 (stream.c)******
static pj_status_t get_frame( pjmedia_port *port, pjmedia_frame *frame)
{
.......
/* Lock jitter buffer mutex first */
pj_mutex_lock( stream->jb_mutex );
.......
}
 
****Calling stack for Thread 2****
 
  pj_mutex_lock(pj_mutex_t * mutex=0x05ccf534)  Line 911 + 0xc bytes C
  get_frame(pjmedia_port * port=0x05ccefec, pjmedia_frame *
frame=0x05b1f96c)  Line 222 + 0xf bytes C
  pjmedia_port_get_frame(pjmedia_port * port=0x05ccefec, pjmedia_frame *
frame=0x05b1f96c)  Line 68 + 0x12 bytes C
  read_port(pjmedia_conf * conf=0x033229fc, conf_port * cport=0x05cc9f80,
short * frame=0x04a6acf8, unsigned int count=320, pjmedia_frame_type *
type=0x05b1fb20)  Line 1363 + 0x10 bytes C
  get_frame(pjmedia_port * this_port=0x04a5becc, pjmedia_frame *
frame=0x05b1fd80)  Line 1771 + 0x22 bytes C
  pjmedia_port_get_frame(pjmedia_port * port=0x04a5becc, pjmedia_frame *
frame=0x05b1fd80)  Line 68 + 0x12 bytes C
  clock_callback(const pj_timestamp * ts=0x04a6afdc, void *
user_data=0x04a6acdc)  Line 181 + 0x10 bytes C
  clock_thread(void * arg=0x04a6afc4)  Line 255 + 0x18 bytes C
  thread_main(void * param=0x04a6b00c)  Line 414 + 0x11 bytes C
 
---------
 
It seems that thread 1 is waiting on a lock which is locked by thread 2
(conf->mutex->OwningThread points to thread 2's ID) and thread 2 is waiting
on a lock which is locked by thread 3 (stream->jb_mutex->OwningThread points
to thread 3's ID). Thread 3 details is as follows:
 
*****Thread 3 (jbuf.c)******
 
  jb_framelist_put_at(jb_framelist_t * framelist=0x05ccf574, unsigned int
index=15447332, const void * frame=0x05baccb4, unsigned int frame_size=80,
unsigned int bit_info=3435973836)  Line 241 + 0x1c bytes C
  pjmedia_jbuf_put_frame2(pjmedia_jbuf * jb=0x05ccf56c, const void *
frame=0x05baccb4, unsigned int frame_size=80, unsigned int
bit_info=3435973836, int frame_seq=15447332, int * discarded=0x05910b2c)
Line 546 + 0x1c bytes C
  on_rx_rtp(void * data=0x05ccefe4, void * pkt=0x05bacca8, long
bytes_read=172)  Line 1404 + 0x47 bytes C
  srtp_rtp_cb(void * user_data=0x05cd2704, void * pkt=0x05bacca8, long
size=172)  Line 809 + 0x1f bytes C
  on_rx_rtp(pj_ioqueue_key_t * key=0x048f8674, pj_ioqueue_op_key_t *
op_key=0x05bab268, long bytes_read=172)  Line 471 + 0x17 bytes C
  ioqueue_dispatch_read_event(pj_ioqueue_t * ioqueue=0x048df834,
pj_ioqueue_key_t * h=0x048f8674)  Line 552 + 0x16 bytes C
  pj_ioqueue_poll(pj_ioqueue_t * ioqueue=0x048df834, const pj_time_val *
timeout=0x0591feac)  Line 764 + 0x17 bytes C
  worker_proc(void * arg=0x048dd91c)  Line 269 + 0x13 bytes C
  thread_main(void * param=0x04a4d940)  Line 414 + 0x11 bytes C

Unlike thread 1 and 2 which are frozen waiting for the locks, thread 3 is
still running and keeps on printing the following message:
 
 19:06:42.156     ec041832F0  Underflow, buf_cnt=0, will generate 1 frame
 19:06:42.156   Master/sound  257 samples reduced, buf_cnt=656
 19:06:42.156     ec041832F0  Underflow, buf_cnt=0, will generate 1 frame
 19:06:42.156   Master/sound  146 samples reduced, buf_cnt=670
 19:06:42.156     ec041832F0  Underflow, buf_cnt=0, will generate 1 frame
 19:06:42.156   Master/sound  185 samples reduced, buf_cnt=645
 19:06:42.156     ec041832F0  Underflow, buf_cnt=0, will generate 1 frame
 19:06:42.156   Master/sound  164 samples reduced, buf_cnt=641
 19:06:42.156     ec041832F0  Underflow, buf_cnt=0, will generate 1 frame
..........
 
Thread 3 never stops printing the message and never releases the lock
(stream->jb_mutex). Therefore, a deadlock occurs between thread 1, 2 and
thread 3.
 
>From what I understand, thread 1 attempts to add a new conference port for
the call which is in 'EARLY' stage. Thread 2 is responsible for maintaining
the audio stream between the conference port. Thread 3 is responsible for
sending/receiving socket data. I could not figured out the exact location
where threads 3 acquire the lock, thus causing the deadlock later on.
 
Any ideas on how I can fix this problem? Is this a bug with PJSIP, or has it
got anything to do with my code?
 
Any advice is appreciated. Thanks in advance.
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pjsip.org/pipermail/pjsip_lists.pjsip.org/attachments/20090614/5262a942/attachment-0001.html>


[Index of Archives]     [Asterisk Users]     [Asterisk App Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [Linux API]
  Powered by Linux