Re: backtrace a segfault

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Finaly I was able to reproduce the segfault on a lower number of mpi processes (32 instead of 192) and the only way to locate the problem was launching each process with gdb....
Not very easy method with all these terminal to manage.

I was unable to get any usefull information directly with gcc/gfortran debug options.

Best regards

Patrick


Toon Moene wrote:
On 04/09/2015 09:06 AM, Patrick Begou wrote:

Hi,

I'm working on a large parallel fortran application which give
(sometime) a segfault. When this error occurs I would like to backtrace
the call stack to know where it takes place but I'm unable to get this
information, no more than a list of memory addresses. I've build a small
test-case (with an error in array dimension creating a segmentation
fault in a subroutine ) to investigate gfortran/gcc options.

With gcc version 4.8.2 using options "-g -fbacktrace -gdwarf-3" I get
./plante
Program received signal SIGSEGV: Segmentation fault - invalid memory
reference.
Backtrace for this error:
#0  0x7F99F71A9AC7
#1  0x7F99F71AA0CE
#2  0x7F99F67A9B2F
Segmentation fault

but addr2line -e ./plante 0x7F99F71AA0CE
returns: ??:0

What have I missed ?

Hard to say. I have the same problem with a (far smaller) program of our weather forecasting suite. Compiled with gfortran 4.9 and linked against the OpenMPI libraries, I get this:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x2ADE9148E407
#1  0x2ADE9148EA1E
#2  0x2ADE91F1C17F
#3  0x5EB24E in update_desc_ at update_desc.F90:55
#4  0x5E97D9 in swapoutdb_ at swapoutdb.F90:16 (discriminator 4)
#5  0x40B259 in bator at Bator.F90:368 (discriminator 2)
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 27638 on node super.moene.org exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

which looks reasonable to me. Perhaps the earlier addresses are simply within the OpenMPI library routines. The error most certainly isn't there, but what you passed as arguments to it.

Kind regards,



--
===================================================================
|  Equipe M.O.S.T.         |                                      |
|  Patrick BEGOU           | mailto:Patrick.Begou@xxxxxxxxxxxxxxx |
|  LEGI                    |                                      |
|  BP 53 X                 | Tel 04 76 82 51 35                   |
|  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71                   |
===================================================================





[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux