hi, i am writing a server app using pjsip library. my app forks multiple threads that calls pjsip_endpt_handle_events() to process pjsip events. log file of my app shows that multiple threads can (easily) poll different timer events belonging to the same sip transaction concurrently, which in some case is fatal!! from pjsip timer.c source code, pj_timer_heap_poll() unlocks timer heap immediately after it removes a timer event node from timer heap and immediately before it calls the node's callback. at this point, a fatal scenario may occur like this way: => thread #1 removes "Retransmit timer event" (eg., corresponding to timer A (or B) of a INVITE (or BYE) transaction) => thread #1 calls logger function to print "Retransmit timer event" and yield cpu to another thread #1 => thread #2 removes "Timeout timer event" (eg., to timer E (or F) of a INVITE (or BYE) transaction) => thread #2 destroys the transaction object => thread #1 resume execution, trying to resend BYE or doing something else on the transaction object, and crashes since the transaction no more exists!!! below i am attaching a piece of log file that shows this fatality with an INVITE transaction during a load test of my app. logging messages below are shown in "logged" (committed) order which, due to multithreading preemption, is not necessarily in ascending timestamp order. (The first column is thread ID and the second is timestamp.) 40868940>20:44:08.832> (pj5) 20:44:08.832 tsx0x2aaab804b Incoming Response msg 200/INVITE/cseq=23261 (rdata0xf57f9a8) in state Proceeding 40868940>20:44:08.832> (pj5) 20:44:08.832 tsx0x2aaab804b State changed from Proceeding to Terminated, event=RX_MSG 40868940>20:44:08.832> (pj5) 20:44:08.832 dlg0x2aaab8042 Transaction tsx0x2aaab804b198 state changed to Terminated 40868940>20:44:08.834> (pj5) 20:44:08.834 tsx0x2aaab804b Timeout timer event 40868940>20:44:08.834> (pj5) 20:44:08.834 tsx0x2aaab804b State changed from Terminated to Destroyed, event=TIMER 40868940>20:44:08.834> (pj5) 20:44:08.834 tsx0x2aaab804b Transaction destroyed! 407E6940>20:44:08.761> (pj5) 20:44:08.761 tsx0x2aaab804b Retransmit timer event 407E6940>20:44:08.872> __assert_fail> ../src/pjsip/sip_transaction.c:tsx_on_state_destroyed(3217)> !"Not expecting any events!!"!! 407E6940>20:44:08.872> oops> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 407E6940>20:44:08.872> oops> oops: sig=11 pid=7cdb 407E6940>20:44:08.872> oops> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! from timestamp, it is obvious that thread 407E6940 removed the 'retransmit timer' event 73ms earlier than thread 40868940 removed the 'timeout timer' event from timer heap. however, when thread 407E6940 tried to log the event it was temporarily suspended by Linux based on Linux thread scheduling policy, and when it was given cpu to process the retransmit timer event the transaction object had been destroyed by thread 40868940, which caused the thread to crash. since both the retransmit timer event and the timeout timer event are manipulated entirely in pjsip internal, my app is unable to "help" synchronize the destruction of the transaction object and retransmission of a message. thanks in advance for any comment. -Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.pjsip.org/pipermail/pjsip_lists.pjsip.org/attachments/20101231/6ce6e214/attachment.html>