On 20/09/16 19:47, Mahmood Naderan wrote: > I ran the command from the compute node. I also set the number of the threads to 1. > > mahmood@compute-0-1:tran-bt-o-40$ ulimit -c unlimited > mahmood@compute-0-1:tran-bt-o-40$ gdb --args /share/apps/siesta/openmpi-2.0.0/bin/mpirun -hostfile hosts.txt -np sc.sh > GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6) > > Reading symbols from /share/apps/siesta/openmpi-2.0.0/bin/mpirun...done. > (gdb) run > ... > ... > [Thread 0x2aaaab447700 (LWP 32506) exited] > [Thread 0x2aaaab246700 (LWP 32505) exited] > > Program exited with code 0204. > Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6_3.6.x86_64 libibverbs-1.1.6-4.el6.x86_64 ibudev-147-2.42.el6.x86_64 > (gdb) disas > No frame selected. > (gdb) x/i $pc > No registers. > (gdb) q > mahmood@compute-0-1:tran-bt-o-40$ ls -l core* > -rw------- 1 mahmood nfsnobody 2342809600 Sep 20 23:12 core.5767 > > > > > > > >> If you can get a core file, then run >> $ gdb <binary> <core-file> > > So, please see the output > > > mahmood@compute-0-1:tran-bt-o-40$ gdb /share/apps/siesta/openmpi-2.0.0/bin/mpirun core.5767 > GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6) > Reading symbols from /share/apps/siesta/openmpi-2.0.0/bin/mpirun...done. > warning: core file may not match specified executable file. > [New Thread 5767] > .. > [New Thread 5784] > Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. > Loaded symbols for /lib64/ld-linux-x86-64.so.2 > Core was generated by `/share/apps/siesta/siesta-4.0/tpar/transiesta'. > Program terminated with signal 4, Illegal instruction. > #0 0x00000000008d3a58 in ?? () > Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6_3.6.x86_64 > (gdb) disas > No function contains program counter for selected frame. > (gdb) mahmood@compute-0-1:tran-bt-o-40$ ls -l core* > Undefined command: "mahmood". Try "help". > (gdb) x/i $pc > => 0x8d3a58: Cannot access memory at address 0x8d3a58 > (gdb) > > > > > > Do you have any idea? Still I am not able to see the illegal instruction > # Sounds like the program has jumped off into the weeds. At this point I think you're going to have to start examining what's on the stack to see if you can find any clues (don't forget to look below the current SP as well as above it, since it may be a return operation from a corrupted stack). I'm not sure there's much else I can add at this point. It's all down to detective work now. R. > > Regards, > Mahmood >