Hello Nanang,
Thank you so much for the quick response!
I have validated your patch, and it's been running robust since Monday so I am confident that it really solves the problem.
I think that all your assumptions were correct; we sometimes do hangup active calls in a PJSIP callback.
There is only one observation I wanted to mention. I'm not sure if it is related to the patch, or the way we use PJSIP:
Sometimes, when our endpoint is under very high stress (incoming calls every 0.1 seconds), it happens that we do
get a pj::Account::onIncomingCall notification, but we successfully perform a call->answer in response.
But it looks like the call is disconnected by the remote so quickly, we never get a related DISCONNECTD event via
pj::Call::onCallState(). And in fact, PJSIP does seem to retain a handle on the call, with a NULL state.
This looks like a problem to me ... do you have any suggestions how we should deal with that situation?
Thanks very much!
Martin
On Thu, Sep 26, 2019 at 10:09 AM Nanang Izzuddin <nanang@xxxxxxxxx> wrote:
Sorry, forgot the patch._______________________________________________On Thu, Sep 26, 2019 at 2:53 PM Nanang Izzuddin <nanang@xxxxxxxxx> wrote:Hi Martin,The pjsip_inv_dec_ref() at the top of the traceback can only be called when invite session state is shifting to DISCONNECTED, while answering the call with 180 (as shown in the traceback) is normally cause invite session state to be shifted to EARLY, instead of DISCONNECTED, so perhaps the call was disconnected somewhere within the invite session callback on_state_changed() and pjsip_inv_dec_ref() was called twice. Attached should be the fix for such scenario, two modifications in the patch: avoid inv sess destroy in the callback by increasing ref count and avoid double pjsip_inv_dec_ref() by updating the DISCONNECTED state shifting check to evaluate the local var 'state' instead of 'inv->state'. Please let us know whether the patch works or not, thanks.Regarding the proposed patch, IMHO perhaps it works because of the additional pjsip_inv_add_ref(), while pjsip_inv_dec_ref(call->inv) is perhaps never called due to call->inv==NULL. It is just a wild guess though, perhaps you could confirm this better by putting breakpoint in pjsip_inv_dec_ref() or simply by adding a log line inside the "if (call->inv)" block.BR,nanangDear PJSIP experts,_______________________________________________Running PJSIP-2.7.1 on Linux-ARMHF, we have seen two very rare cases of the assertion pjsip_inv_dec_ref: Assertion `inv && inv->ref_cnt' failed. The full traceback is pasted below. Exactly the same traceback became 100% reproducible when I slowed down my SIP client by running it with the gperftools heap profiler so it looks like a subtle race condition of clients under stress.After detailed investigation, I came across https://trac.pjsip.org/repos/ticket/1959 , and I believe that exactly this issue happened in my case: SIP invitation was created in pjsua_call.c:1666 so call->inv is good there. But while the same thread swas still processing, another thread already deleted call->inv.I have come up with attached patch to fix the problem. It looks like a safe fix to me, using the existing refcount mechanism to address the problem. Does it look reasonable? Should it be included in PJSIP? Or is the root cause perhaps something else ... which Thread could interfere with the ongoing invite? Is it perhaps fixed in trunk already?Thanks for any feedback!Here is the full traceback for reference:#0 0x76e59398 in pjsip_inv_dec_ref (inv=0x4f4b214) at ../src/pjsip-ua/sip_inv.c:245#1 0x76e5d810 in inv_set_state (inv=0x4f4b214, state=<optimized out>, e=<optimized out>) at ../src/pjsip-ua/sip_inv.c:323
#2 0x76e5967c in mod_inv_on_tsx_state (tsx=0x4f39864, e=0x6ead64ac) at ../src/pjsip-ua/sip_inv.c:717
#3 0x76e0faf8 in pjsip_dlg_on_tsx_state (dlg=0x4f32864, tsx=0x4f39864, e=0x6ead64ac) at ../src/pjsip/sip_dialog.c:2066
#4 0x76e07d38 in tsx_set_state (tsx=tsx@entry=0x4f39864, state=state@entry=PJSIP_TSX_STATE_PROCEEDING, event_src_type=event_src_type@entry=PJSIP_EVENT_TX_MSG, event_src=event_src@entry=0x4ef0064,
flag=flag@entry=0) at ../src/pjsip/sip_transaction.c:1267
#5 0x76e0a280 in tsx_on_state_proceeding_uas (tsx=0x4f39864, event=<optimized out>) at ../src/pjsip/sip_transaction.c:2710
#6 0x76e0c1e0 in pjsip_tsx_send_msg (tsx=tsx@entry=0x4f39864, tdata=tdata@entry=0x4ef0064) at ../src/pjsip/sip_transaction.c:1789
#7 0x76e0f334 in pjsip_dlg_send_response (dlg=0x4f32864, tsx=0x4f39864, tdata=tdata@entry=0x4ef0064) at ../src/pjsip/sip_dialog.c:1531
#8 0x76e5d6f8 in pjsip_inv_send_msg (inv=0x4f4b214, tdata=0x4ef0064) at ../src/pjsip-ua/sip_inv.c:3231
#9 0x76e967f8 in pjsua_call_answer2 (call_id=0, opt=0x0, code=180, reason=0x0, msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2317
#10 0x76e9694c in process_pending_call_answer (call=call@entry=0x76ed5aa8 <pjsua_var+19928>) at ../src/pjsua-lib/pjsua_call.c:978
#11 0x76e98280 in pjsua_call_on_incoming (rdata=rdata@entry=0x2b6c014) at ../src/pjsua-lib/pjsua_call.c:1666
#12 0x76e9b698 in mod_pjsua_on_rx_request (rdata=<optimized out>) at ../src/pjsua-lib/pjsua_core.c:582
#13 0x76df29fc in pjsip_endpt_process_rx_data (endpt=0x29b1064, rdata=0x2b6c014, p=<optimized out>, p_handled=0x6ead6894) at ../src/pjsip/sip_endpoint.c:893
#14 0x76df2bd4 in endpt_on_rx_msg (endpt=0x29b1064, status=<optimized out>, rdata=0x2b6c014) at ../src/pjsip/sip_endpoint.c:1043
#15 0x76dfa000 in pjsip_tpmgr_receive_packet (mgr=0x2ade678, rdata=rdata@entry=0x2b6c014) at ../src/pjsip/sip_transport.c:1965
#16 0x76dfcdc0 in udp_on_read_complete (key=0x2add330, op_key=<optimized out>, bytes_read=1060) at ../src/pjsip/sip_transport_udp.c:170
#17 0x76c567b4 in ioqueue_dispatch_read_event (ioqueue=<optimized out>, h=0x2add330) at ../src/pj/ioqueue_common_abs.c:605
#18 0x76c581c8 in pj_ioqueue_poll (ioqueue=0x2add080, timeout=timeout@entry=0x6ead6d28) at ../src/pj/ioqueue_epoll.c:812
#19 0x76df26fc in pjsip_endpt_handle_events2 (endpt=0x29b1064, max_timeout=max_timeout@entry=0x6ead6d60, p_count=p_count@entry=0x6ead6d5c) at ../src/pjsip/sip_endpoint.c:744
#20 0x76e9c704 in pjsua_handle_events (msec_timeout=<optimized out>) at ../src/pjsua-lib/pjsua_core.c:1988
#21 0x76e9c744 in worker_thread (arg=<optimized out>) at ../src/pjsua-lib/pjsua_core.c:704
#22 0x76c59bec in thread_main (param=0x2aa04d4) at ../src/pj/os_core_unix.c:541
#23 0x76a5d07c in start_thread (arg=0x6ead73b0) at pthread_create.c:339
#24 0x422408e0 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:89 from /lib/libc.so.6
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
_______________________________________________ Visit our blog: http://blog.pjsip.org pjsip mailing list pjsip@xxxxxxxxxxxxxxx http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org