GCC compiles but code crashes. Works w/ Intel compiler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I apologize in advance for a complex and poorly defined question/bug. I'd love to provide an MWE but cannot. I'm working with a very complex, large, and historical codebase. We've coupled a FORTRAN NASA global climate model https://simplex.giss.nasa.gov/snapshots/ with a C++ ice sheet model https://www.pism.io/ .

Everything runs when this (and ~20 dependencies) are compiled with Intel and Spack on our supercomputer.

I'm trying to rebuild everything using GNU/gcc. Each tool runs stand-alone with GCC on both the supercomputer and my laptop. 

I also now have all the dependencies rebuilt with GNU (lots of guesswork there). It runs for 1 day. It fails on day 2 when the coupling between the models is done for the first time. I've traced the error to the ~785th element of a 10k element array that blows up. I get the same error on both my laptop and on our supercomputer. 

When add a PRINT statement to the Intel/Spack install, I see:

i: 780 deltah(i): 0.83826
i: 781 deltah(i): 0.849428
i: 782 deltah(i): 0.856929
i: 783 deltah(i): 0
i: 784 deltah(i): 0.910464
i: 785 deltah(i): 0.747764
i: 786 deltah(i): 0.774704
i: 787 deltah(i): 0.858931
i: 788 deltah(i): 0.823518
i: 789 deltah(i): 0.939335
i: 790 deltah(i): 0

And when I go to the same place in gdb on the GNU version after it crashes, I see the following (note the first 4 numbers shown here are almost identical, as are indices 0 to 780 - presumably just hardware/compiler differences).

frame 11 # move to `icebin/slib/icebin/contracts/modele_pism.cpp:111`
p deltah.data_[780]@10
# $22 = {0.83825857976891971, 0.84942598585903251, 0.85692695613342984, 
0, 0.41908462753569942, 3.3390526779554853e-313, 3.2211851062927428e-311, 
4.2653676902411122e-311, 1.8555380963204083e+251, 0 <repeats 11 times>}


There are of course many changes other than just Intel/GNU: When changing from Spack to Spackless, I had to figure out/guess how to deal with ~30 dependencies. About 10 moved into a Conda/mamba environment, and I hand-build the rest. I've guessed at many Cmake and configure commands. There's lots of places where I could have introduced a problem, and I introduced plenty. But I've solved enough that it compiles, runs, and the data looks good for the first ~785 elements of this array.

That makes me think, for some reason I'm not sure why, that maybe it's a compiler issue. If anyone has any suggestion how to debug this further, I'd be happy to hear it.

Thank you,

  Ken Mankoff


FYI, here is how I build one of the many dependencies, the 'icebin' tool above where the crash occurs (although I believe `deltah`, a Blitz array, is generated elsewhere in Fortran, but the crash occurs here in C++).


CC="${LIME_ROOT}/opt/bin/mpicc" \
  CXX="${LIME_ROOT}/opt/bin/mpicxx" \
  FC="${LIME_ROOT}/opt/bin/mpif90" \
  PETSC_DIR="${LIME_ROOT}/src/petsc-3.7.7" \
  PETSC_ARCH="arch-linux2-c-debug" \
  cmake .. \
  -D CMAKE_INSTALL_PREFIX=${LIME_ROOT}/opt \
  -D CMAKE_C_FLAGS="-DNDEBUG -O0 -ggdb3 -fpermissive -fPIC -I${MAMBA_ENV}/meli/lib/python3.11/site-packages/numpy/core/include" \
  -D CMAKE_CXX_FLAGS="-DNDEBUG -O0 -ggdb3 -fpermissive -fPIC -I${MAMBA_ENV}/meli/lib/python3.11/site-packages/numpy/core/include" \
  -D CMAKE_PREFIX_PATH="${LIME_ROOT}/opt/include/boost:${MAMBA_ENV}/meli" \
  -D CMAKE_IGNORE_PATH="/usr;/lib;/usr/include;/usr/lib;/usr/lib64;/usr/bin" \
  -D BUILD_COUPLER=YES \
  -D BUILD_MODELE=YES \
  -D BUILD_GRIDGEN=YES \
  -D BUILD_PYTHON=YES \
  -D USE_PISM=YES \
  -D Boost_INCLUDE_DIR=${LIME_ROOT}/opt/include \
  -D Boost_INCLUDE_DIRS=${LIME_ROOT}/opt/include \
  -D Boost_LIBRARY_DIRS=${LIME_ROOT}/opt/lib \
  -D BLITZ_ROOT=${LIME_ROOT}/opt \
  -D BLITZ_LIBRARY=${LIME_ROOT}/opt/lib/libblitz.so \
  -D CGAL_LIBRARY=${LIME_ROOT}/opt/lib/libCGAL.so \
  -D CGAL_INCLUDE_DIR=${LIME_ROOT}/opt/include \
  -D CYTHON_EXECUTABLE=${MAMBA_ENV}/meli/bin/cython \
  -D EIGEN3_INCLUDE_DIR=${MAMBA_ENV}/meli/include/eigen3 \
  -D EVERYTRACE_c_REFADDR=${LIME_ROOT}/opt/lib \
  -D EVERYTRACE_INCLUDE_DIR=${LIME_ROOT}/opt/include \
  -D EVERYTRACE_LIBRARY=${LIME_ROOT}/opt/lib/libeverytrace.so \
  -D GMP_INCLUDE_DIR=${MAMBA_ENV}/meli/include \
  -D GMP_LIBRARY=${MAMBA_ENV}/meli/lib/libgmp.so \
  -D GTEST_LIBRARY_MAIN=${MAMBA_ENV}/meli/lib/libgtest.so \
  -D GTEST_INCLUDE_DIR=${MAMBA_ENV}/meli/include \
  -D IBMISC_ROOT=${LIME_ROOT}/opt \
  -D IBMISC_INCLUDE_DIR=${LIME_ROOT}/opt/include \
  -D IBMISC_LIBRARY=${LIME_ROOT}/opt/lib/libibmisc.so \
  -D MPFR_INCLUDES=${MAMBA_ENV}/meli/include \
  -D MPFR_LIBRARIES=${MAMBA_ENV}/meli/lib/libmpfr.so \
  -D MPIEXEC_EXECUTABLE=${LIME_ROOT}/opt/bin/mpiexec \
  -D MPI_C_COMPILER=${LIME_ROOT}/opt/bin/mpicc \
  -D MPI_CXX_COMPILER=${LIME_ROOT}/opt/bin/mpicxx \
  -D MPI_Fortran_COMPILER=${LIME_ROOT}/opt/bin/mpif90 \
  -D NETCDF_CXX4_LIBRARY=${LIME_ROOT}/opt/lib/libnetcdf-cxx4.so \
  -D NETCDF_CXX4_INCLUDE_DIR=${LIME_ROOT}/opt/include \
  -D PROJ4_INCLUDES=${MAMBA_ENV}/meli/include \
  -D PROJ4_LIBRARIES=${MAMBA_ENV}/meli/lib/libproj.so \
  -D PYTHON_EXECUTABLE=${MAMBA_ENV}/meli/bin/python \
  -D PYTHON_LIBRARY=${MAMBA_ENV}/meli/lib/libpython3.so \
  -D PYTHON_INCLUDES=${MAMBA_ENV}/meli/include/python3.11 \
  -D TCLAP_INCLUDE_DIR=${MAMBA_ENV}/meli/include \
  -D ZLIB_INCLUDE_DIR=${MAMBA_ENV}/meli/include \
  -D ZLIB_LIBRARY=${MAMBA_ENV}/meli/lib/libz.so \
  -Wno-dev

make -j
make install




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux