Re: backtrace a segfault

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10 April 2015 at 08:25, Patrick Begou
<Patrick.Begou@xxxxxxxxxxxxxxxxxxxx> wrote:
> I have also tested -gdwarf-2 option in gcc 4.8.2. I get the same output with
> addr2line.
>
> Patrick
>
>
> Janne Blomqvist wrote:
>>
>> On Thu, Apr 9, 2015 at 8:48 PM, Toon Moene <toon@xxxxxxxxx> wrote:
>>>
>>> On 04/09/2015 09:06 AM, Patrick Begou wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm working on a large parallel fortran application which give
>>>> (sometime) a segfault. When this error occurs I would like to backtrace
>>>> the call stack to know where it takes place but I'm unable to get this
>>>> information, no more than a list of memory addresses. I've build a small
>>>> test-case (with an error in array dimension creating a segmentation
>>>> fault in a subroutine ) to investigate gfortran/gcc options.
>>>>
>>>> With gcc version 4.8.2 using options "-g -fbacktrace -gdwarf-3" I get
>>>> ./plante
>>>> Program received signal SIGSEGV: Segmentation fault - invalid memory
>>>> reference.
>>>> Backtrace for this error:
>>>> #0  0x7F99F71A9AC7
>>>> #1  0x7F99F71AA0CE
>>>> #2  0x7F99F67A9B2F
>>>> Segmentation fault
>>>>
>>>> but addr2line -e ./plante 0x7F99F71AA0CE
>>>> returns: ??:0
>>>>
>>>> What have I missed ?
>>
>> Depending on which version of binutils you have, addr2line may have
>> problems understanding the DWARF-3 debug format. Try with -gdwarf-2.
>>
>>
>>> Hard to say.  I have the same problem with a (far smaller) program of our
>>> weather forecasting suite. Compiled with gfortran 4.9 and linked against
>>> the
>>> OpenMPI libraries, I get this:
>>>
>>> Program received signal SIGSEGV: Segmentation fault - invalid memory
>>> reference.
>>>
>>> Backtrace for this error:
>>> #0  0x2ADE9148E407
>>> #1  0x2ADE9148EA1E
>>> #2  0x2ADE91F1C17F
>>> #3  0x5EB24E in update_desc_ at update_desc.F90:55
>>> #4  0x5E97D9 in swapoutdb_ at swapoutdb.F90:16 (discriminator 4)
>>> #5  0x40B259 in bator at Bator.F90:368 (discriminator 2)
>>>
>>> --------------------------------------------------------------------------
>>> mpirun noticed that process rank 0 with PID 27638 on node super.moene.org
>>> exited on signal 11 (Segmentation fault).
>>>
>>> --------------------------------------------------------------------------
>>>
>>> which looks reasonable to me.  Perhaps the earlier addresses are simply
>>> within the OpenMPI library routines.  The error most certainly isn't
>>> there,
>>> but what you passed as arguments to it.
>>
>> Typically the first few stack frames are due to the backtrace printing
>> routines in libgfortran. Another problem is that addr2line can't
>> resolve addresses from shared libraries (incl. libgfortran.so);
>> linking statically is a workaround..

I'd definitely try static linking, yes. Maybe valgrind helps to
pinpoint the fault?
I take it you're aware of the hints given in
http://www.open-mpi.org/faq/?category=debugging

HTH,




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux