On Sun, Jun 18, 2023 at 8:52 AM Ken Mankoff via Gcc-help < gcc-help@xxxxxxxxxxx> wrote: > Hi Jonathan, > > On 2023-06-18 at 01:10 -07, Jonathan Wakely <jwakely.gcc@xxxxxxxxx> > wrote... > > On Sun, 18 Jun 2023, 01:58 Ken Mankoff via Gcc-help, > > <gcc-help@xxxxxxxxxxx> > > wrote: > > > >> I'm trying to rebuild everything using GNU/gcc. > > > > > > What does this mean? Most people here are not familiar with Spack, and > > I have no idea what it means to rebuild using GNU/gcc. Do you just > > mean using gcc instead of the Intel compiler? > > By 'rebuild' I meant 'compile'. Yes - I'm trying to compile using gcc > instead of Intel. Spack is a package manager. Perhaps not useful > information. > > >> I also now have all the dependencies rebuilt with GNU (lots of > >> guesswork there). It runs for 1 day. It fails on day 2 when the > >> coupling between the models is done for the first time. > > > > Fails how? > > > > It crashes? How? What causes it to crash? What does gdb show? > > An array contains a value (1.8e+215) causing an assert to fail. I provided > gdb output. > > > Perhaps also useful - this same thing occurs on two different machines: > > $ lsb_release -a > Description: Ubuntu 22.04.2 LTS > > $ uname -a > Linux t480 5.15.0-72-generic #79-Ubuntu SMP Wed Apr 19 08:22:18 UTC 2023 > x86_64 x86_64 x86_64 GNU/Linux > > $ gcc --version > gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 > > And > > $ lsb_release -a > Description: SUSE Linux Enterprise Server 12 SP5 > Release: 12.5 > > $ uname -a > Linux discover12 4.12.14-122.156-default #1 SMP Wed Apr 5 06:49:18 UTC > 2023 (026e398) x86_64 x86_64 x86_64 GNU/Linux > > $ gcc --version > gcc (GCC) 12.1.0 > > > -k. > Hmm... what an interesting problem. The Intel compiler works but gcc does not, is that correct? If so, narrowing down the difference between Intel and gcc could be key to unraveling the issue. If I understood correctly though you're saying several other changes besides the compiler swap was done. For the sake of argument assume there is a bug in the source code and what's happening is code generated by gcc is revealing that bug. In contrast, something the Intel compiler is doing lets it slip by. We can disregard any concern about GNU/Linux version or specific gcc version because you show SUSE and Ubuntu and gcc 11.3 and 12.1.0 show the same problem. Is there any way to narrow down the scope? Remove as much unrelated source and compile steps as possible while still demonstrating the problem? Finally I would caution against ~785th element becoming too narrow a focus. Of course seeing data at the point of failure likely is telling you something. But.... is it possible this is just a side effect? That the actual error occurred earlier and went unnoticed? Wish you all the best with this. Cheers, -Randy