Hi Ming,
I am using Asterisk bundled, which is PJSIP 2.8 with patches:
"Fixed #2191:
- Stricter double timer entry scheduling prevention.
- Integrate group lock in SIP transport, e.g: for add/dec ref, for timer scheduling."
I am have just had a further crash in a different location, which looks like its crashed due to a timer entry being cancelled and an error is happening while moving nodes. Is more locking required when performing operations on the timer heap?
#0 0x00007f7ae3dc0249 in copy_node (ht=0x184cbf0, slot=2305, moved_node=0x7f797732f0c0) at ../src/pj/timer.c:137
#1 0x00007f7ae3dc0677 in reheap_up (ht=0x184cbf0, moved_node=0x7f780e0f2990, slot=2305, parent=1152) at ../src/pj/timer.c:208
#2 0x00007f7ae3dc0882 in remove_node (ht=0x184cbf0, slot=2305) at ../src/pj/timer.c:254
parent = 1152
moved_node = 0x7f780e0f2990
removed_node = 0x7f797f4a9e08
#3 0x00007f7ae3dc0b64 in cancel (ht=0x184cbf0, entry=0x7f797f4a9e08, flags=1) at ../src/pj/timer.c:353
timer_node_slot = 2305
#4 0x00007f7ae3dc1061 in cancel_timer (ht=0x184cbf0, entry=0x7f797f4a9e08, flags=0, id_val=0) at ../src/pj/timer.c:591
count = 32634
#5 0x00007f7ae3dc10ea in pj_timer_heap_cancel (ht=0x184cbf0, entry=0x7f797f4a9e08) at ../src/pj/timer.c:611
#6 0x00007f7ae3d0f0c8 in pjsip_endpt_cancel_timer (endpt=0x184c908, entry=0x7f797f4a9e08) at ../src/pjsip/sip_endpoint.c:848
#7 0x00007f7ae3cf62f6 in set_timer (sub=0x7f797f4a9cc8, timer_id=2, seconds=3600) at ../src/pjsip-simple/evsub.c:508
#8 0x00007f7ae3cf91cc in on_tsx_state_uas (sub=0x7f797f4a9cc8, tsx=0x7f780e0f27d8, event=0x7f783ebf98c0) at ../src/pjsip-simple/evsub.c:2172
expires = 0x7f7a10f9e600
tdata = 0x7f799cbcc068
st_code = 200
st_text = 0x0
body = 0x0
old_state = PJSIP_EVSUB_STATE_ACTIVE
old_state_str = {ptr = 0x7f7ae3dc6f0c "ACTIVE", slen = 6}
reason = {ptr = 0x0, slen = 0}
rdata = 0x7f7a10f95a68
msg = 0x7f7a10f9dac0
res_hdr = {prev = 0x7f783ebf9700, next = 0x7f783ebf9700, type = 284812696, name = {ptr = 0x7f797e162ca8 "", slen = 284809960}, sname = {ptr = 0x7f797f4a9cc8 "evsub0x7f797f4a9cc8", slen = 140154425546736}, vptr = 0x7f7ae3cf7f2f <on_new_transaction+724>}
status = 0
event_hdr = 0x7f7a10f9e548
Kind regards,
Ross
From: pjsip <pjsip-bounces@xxxxxxxxxxxxxxx> on behalf of Ming <ming@xxxxxxxxx>
Sent: 30 May 2019 07:20 To: pjsip list Subject: Re: Segfault in timer.c Hi Ross,
- Is there any steps or specific scenario to reproduce the issue? - Which PJSIP version are you using? You mentioned ticket #2176, which additional patches do you apply to the base version you're using? - Our latest fix with regard to the timer is ticket #2191 (https://trac.pjsip.org/repos/ticket/2191). Please let us know if the problem still persists after this patch. Another ticket which may be relevant is ticket #2172 (https://trac.pjsip.org/repos/ticket/2172). - For the stack trace which you provided, the crash seems to be caused when accessing the node 0x7fa9c4c04c50. So, you will need to check the PJSIP log, with PJ_TIMER_DEBUG on, and see to which component the timer entry belongs, and when&how the pool which allocates the entry gets destroyed, and why isn't the timer cancelled first before the deallocation. Regards, Ming On Tue, Apr 9, 2019 at 9:04 PM Ross Beer <ross.beer@xxxxxxxxxxx> wrote: > > Hello, > > The previous segfault was with issue #2176 (https://trac.pjsip.org/repos/changeset/5934) patch applied, this patch appears to increase the frequency of segfaults in the 'pj_timer_heap_poll' process instead of mitigating them. > > Can anyone offer assistance in getting this resolved? > > Regards, > > Ross > > ________________________________ > From: pjsip <pjsip-bounces@xxxxxxxxxxxxxxx> on behalf of Ross Beer <ross.beer@xxxxxxxxxxx> > Sent: 08 April 2019 17:05 > To: pjsip@xxxxxxxxxxxxxxx > Subject: Segfault in timer.c > > Hi, > > We are seeing multiple segfaults while copying nodes: > > Thread 1 (Thread 0x7fa977fff700 (LWP 36699)): > #0 0x00007fac956676fe in copy_node (ht=0x16d0410, slot=430, moved_node=0x7fa9c4c04c50) at ../src/pj/timer.c:137 > #1 0x00007fac956679d9 in reheap_down (ht=0x16d0410, moved_node=0x7fab60031970, slot=430, child=862) at ../src/pj/timer.c:185 > #2 0x00007fac95667d1d in remove_node (ht=0x16d0410, slot=0) at ../src/pj/timer.c:252 > parent = 0 > moved_node = 0x7fab60031970 > removed_node = 0x7fa9b5978ae0 > #3 0x00007fac95668634 in pj_timer_heap_poll (ht=0x16d0410, next_delay=0x7fa977ffecd0) at ../src/pj/timer.c:634 > node = 0x100000002 > grp_lock = 0x31dc8c88 > now = {sec = 8157205, msec = 523} > count = 0 > #4 0x00007fac955b6b06 in pjsip_endpt_handle_events2 (endpt=0x16d0128, max_timeout=0x7fa977ffed40, p_count=0x0) at ../src/pjsip/sip_endpoint.c:715 > timeout = {sec = 0, msec = 0} > count = 0 > net_event_count = 0 > c = 0 > #5 0x00007fac955b6c4b in pjsip_endpt_handle_events (endpt=0x16d0128, max_timeout=0x7fa977ffed40) at ../src/pjsip/sip_endpoint.c:776 > #6 0x00007faac8024bcf in monitor_thread_exec (endpt=0x0) at res_pjsip.c:4512 > delay = {sec = 0, msec = 10} > #7 0x00007fac9564fcb0 in thread_main (param=0x1815b28) at ../src/pj/os_core_unix.c:541 > rec = 0x1815b28 > result = 0x0 > rc = 0 > #8 0x00007fac9342ddd5 in start_thread () at /usr/lib64/libpthread.so.0 > #9 0x00007fac927cfead in clone () at /usr/lib64/libc.so.6 > > > Can anyone assist in resolving the issue? > > Regards, > > Ross > _______________________________________________ > Visit our blog: http://blog.pjsip.org > > pjsip mailing list > pjsip@xxxxxxxxxxxxxxx > http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org _______________________________________________ Visit our blog: http://blog.pjsip.org pjsip mailing list pjsip@xxxxxxxxxxxxxxx http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org |
_______________________________________________ Visit our blog: http://blog.pjsip.org pjsip mailing list pjsip@xxxxxxxxxxxxxxx http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org