Re: Segfault in timer.c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ross,

I'm afraid the backtrace alone is not sufficient to pinpoint the cause. We know that the current timer implementation is not easy to troubleshoot, and we're thinking of changing it, but for now in order to investigate the issue further, we will need to get additional information which I mentioned in the previous email, such as the steps to reproduce, and/or the complete PJSIP logs level 6 compiled with the timer debugging on.

And I don't think adding locks can help here, because it looks like the issue is caused by a dangling timer entry, i.e. a timer entry whose owner/scheduler has been deleted.

Best regards,
Ming

On Thu, May 30, 2019 at 9:16 PM Ross Beer <ross.beer@xxxxxxxxxxx> wrote:
Hi Ming,

I am using Asterisk bundled, which is PJSIP 2.8 with patches:

"Fixed #2191:   
- Stricter double timer entry scheduling prevention.  
- Integrate group lock in SIP transport, e.g: for add/dec ref,     for timer scheduling."

I am have just had a further crash in a different location, which looks like its crashed due to a timer entry being cancelled and an error is happening while moving nodes. Is more locking required when performing operations on the timer heap?

Thread 1 (Thread 0x7f783ebfa700 (LWP 83829)):
#0  0x00007f7ae3dc0249 in copy_node (ht=0x184cbf0, slot=2305, moved_node=0x7f797732f0c0) at ../src/pj/timer.c:137
#1  0x00007f7ae3dc0677 in reheap_up (ht=0x184cbf0, moved_node=0x7f780e0f2990, slot=2305, parent=1152) at ../src/pj/timer.c:208
#2  0x00007f7ae3dc0882 in remove_node (ht=0x184cbf0, slot=2305) at ../src/pj/timer.c:254
        parent = 1152
        moved_node = 0x7f780e0f2990
        removed_node = 0x7f797f4a9e08
#3  0x00007f7ae3dc0b64 in cancel (ht=0x184cbf0, entry=0x7f797f4a9e08, flags=1) at ../src/pj/timer.c:353
        timer_node_slot = 2305
#4  0x00007f7ae3dc1061 in cancel_timer (ht=0x184cbf0, entry=0x7f797f4a9e08, flags=0, id_val=0) at ../src/pj/timer.c:591
        count = 32634
#5  0x00007f7ae3dc10ea in pj_timer_heap_cancel (ht=0x184cbf0, entry=0x7f797f4a9e08) at ../src/pj/timer.c:611
#6  0x00007f7ae3d0f0c8 in pjsip_endpt_cancel_timer (endpt=0x184c908, entry=0x7f797f4a9e08) at ../src/pjsip/sip_endpoint.c:848
#7  0x00007f7ae3cf62f6 in set_timer (sub=0x7f797f4a9cc8, timer_id=2, seconds=3600) at ../src/pjsip-simple/evsub.c:508
#8  0x00007f7ae3cf91cc in on_tsx_state_uas (sub=0x7f797f4a9cc8, tsx=0x7f780e0f27d8, event=0x7f783ebf98c0) at ../src/pjsip-simple/evsub.c:2172
        expires = 0x7f7a10f9e600
        tdata = 0x7f799cbcc068
        st_code = 200
        st_text = 0x0
        body = 0x0
        old_state = PJSIP_EVSUB_STATE_ACTIVE
        old_state_str = {ptr = 0x7f7ae3dc6f0c "ACTIVE", slen = 6}
        reason = {ptr = 0x0, slen = 0}
        rdata = 0x7f7a10f95a68
        msg = 0x7f7a10f9dac0
        res_hdr = {prev = 0x7f783ebf9700, next = 0x7f783ebf9700, type = 284812696, name = {ptr = 0x7f797e162ca8 "", slen = 284809960}, sname = {ptr = 0x7f797f4a9cc8 "evsub0x7f797f4a9cc8", slen = 140154425546736}, vptr = 0x7f7ae3cf7f2f <on_new_transaction+724>}
        status = 0
        event_hdr = 0x7f7a10f9e548

Kind regards,

Ross


From: pjsip <pjsip-bounces@xxxxxxxxxxxxxxx> on behalf of Ming <ming@xxxxxxxxx>
Sent: 30 May 2019 07:20
To: pjsip list
Subject: Re: Segfault in timer.c
 
Hi Ross,

- Is there any steps or specific scenario to reproduce the issue?
- Which PJSIP version are you using? You mentioned ticket #2176, which
additional patches do you apply to the base version you're using?
- Our latest fix with regard to the timer is ticket #2191
(https://trac.pjsip.org/repos/ticket/2191). Please let us know if the
problem still persists after this patch. Another ticket which may be
relevant is ticket #2172 (https://trac.pjsip.org/repos/ticket/2172).
- For the stack trace which you provided, the crash seems to be caused
when accessing the node 0x7fa9c4c04c50. So, you  will need to check
the PJSIP log, with PJ_TIMER_DEBUG on, and see to which component the
timer entry belongs, and when&how the pool which allocates the entry
gets destroyed, and why isn't the timer cancelled first before the
deallocation.

Regards,
Ming


On Tue, Apr 9, 2019 at 9:04 PM Ross Beer <ross.beer@xxxxxxxxxxx> wrote:
>
> Hello,
>
> The previous segfault was with issue #2176 (https://trac.pjsip.org/repos/changeset/5934) patch applied, this patch appears to increase the frequency of segfaults in the 'pj_timer_heap_poll' process instead of mitigating them.
>
> Can anyone offer assistance in getting this resolved?
>
> Regards,
>
> Ross
>
> ________________________________
> From: pjsip <pjsip-bounces@xxxxxxxxxxxxxxx> on behalf of Ross Beer <ross.beer@xxxxxxxxxxx>
> Sent: 08 April 2019 17:05
> To: pjsip@xxxxxxxxxxxxxxx
> Subject: Segfault in timer.c
>
> Hi,
>
> We are seeing multiple segfaults while copying nodes:
>
> Thread 1 (Thread 0x7fa977fff700 (LWP 36699)):
> #0  0x00007fac956676fe in copy_node (ht=0x16d0410, slot=430, moved_node=0x7fa9c4c04c50) at ../src/pj/timer.c:137
> #1  0x00007fac956679d9 in reheap_down (ht=0x16d0410, moved_node=0x7fab60031970, slot=430, child=862) at ../src/pj/timer.c:185
> #2  0x00007fac95667d1d in remove_node (ht=0x16d0410, slot=0) at ../src/pj/timer.c:252
>         parent = 0
>         moved_node = 0x7fab60031970
>         removed_node = 0x7fa9b5978ae0
> #3  0x00007fac95668634 in pj_timer_heap_poll (ht=0x16d0410, next_delay=0x7fa977ffecd0) at ../src/pj/timer.c:634
>         node = 0x100000002
>         grp_lock = 0x31dc8c88
>         now = {sec = 8157205, msec = 523}
>         count = 0
> #4  0x00007fac955b6b06 in pjsip_endpt_handle_events2 (endpt=0x16d0128, max_timeout=0x7fa977ffed40, p_count=0x0) at ../src/pjsip/sip_endpoint.c:715
>         timeout = {sec = 0, msec = 0}
>         count = 0
>         net_event_count = 0
>         c = 0
> #5  0x00007fac955b6c4b in pjsip_endpt_handle_events (endpt=0x16d0128, max_timeout=0x7fa977ffed40) at ../src/pjsip/sip_endpoint.c:776
> #6  0x00007faac8024bcf in monitor_thread_exec (endpt=0x0) at res_pjsip.c:4512
>         delay = {sec = 0, msec = 10}
> #7  0x00007fac9564fcb0 in thread_main (param=0x1815b28) at ../src/pj/os_core_unix.c:541
>         rec = 0x1815b28
>         result = 0x0
>         rc = 0
> #8  0x00007fac9342ddd5 in start_thread () at /usr/lib64/libpthread.so.0
> #9  0x00007fac927cfead in clone () at /usr/lib64/libc.so.6
>
>
> Can anyone assist in resolving the issue?
>
> Regards,
>
> Ross
> _______________________________________________
> Visit our blog: http://blog.pjsip.org
>
> pjsip mailing list
> pjsip@xxxxxxxxxxxxxxx
> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

[Index of Archives]     [Asterisk Users]     [Asterisk App Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [Linux API]
  Powered by Linux