Re: GCC compiles but code crashes. Works w/ Intel compiler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jun 18, 2023 at 9:25 AM Ken Mankoff <mankoff@xxxxxxxxx> wrote:

> Hi Randy,
>
> > The Intel compiler works but gcc does not, is that correct?
>
> Yes.
>
> > If I understood correctly though you're saying several other changes
> > besides the compiler swap was done.
>
> Unfortunately. I prefer to change one thing at a time, but was unable to
> do so here.
>
> > For the sake of argument assume there is a bug in the source code and
> > what's happening is code generated by gcc is revealing that bug. In
> > contrast, something the Intel compiler is doing lets it slip by.
>
> Seems reasonable - and thank you, this has given me an idea. I have a
> little bit of control over the Intel installation - I can try turning on a
> few debug flags, perhaps make it more sensitive, and perhaps get Intel to
> crash to find this (or other) bugs. Causing Intel to crash and then solving
> it may help here. It's at least a path forward.
>
> > Is there any way to narrow down the scope? Remove as much unrelated
> > source and compile steps as possible while still demonstrating the
> > problem?
>
> I haven't been able to do this yet. Part of this whole change from the
> Spack package manager is an attempt at simplification. We're working on it.
>
> > Finally I would caution against ~785th element becoming too narrow a
> > focus. Of course seeing data at the point of failure likely is telling
> > you something. But.... is it possible this is just a side effect? That
> > the actual error occurred earlier and went unnoticed?
>
> Agreed. But I'm working with what I have so far :).
>
> Anyway, I now have two ideas:
>
> 1) Try to get the Intel code to crash (through more restrictive compile
> options) and figure out why.
>
> 2) Perhaps I can get the non-Spack version compiling under Intel instead
> of gcc. This is closer to change-1-thing-at-a-time. I tried already and was
> not successful, but more work here may succeed, which would then mean I'm
> in total control of the Intel version, which would make idea (1) easier to
> work with.
>
> Thanks!
>
>   -k.
>

Hi Ken. The other thing I thought of after my reply was to try running the
program under a tool like valgrind. I haven't used valgrind much, but just
recently it found a buffer overrun that had been in our code for years.  I
like both ideas. In theory the meaning of C code should not change
regardless of compiler, OS or options. But... subtle bugs have ways of
hiding then popping out upon such changes.

You've got several things to explore that I'd label "top-down." The flip is
"bottom-up," where you assume for the sake of the effort, the code never
worked. It is just an incomplete piece of software and your job is to
happily dig deep at the crash point. If needed generate an assembly
listing, carefully step through (on paper if needed) the last few
instructions whilst looking at every input you can see in memory (heap &
stack) and in registers.

My final bit of advice is work to a point where nothing seems to make
sense, then... take a deep breath and go for a walk, find a bit of rest. It
is amazing what a rested-brain quietly churning on a problem can see.

Good luck. I'll be interested to hear how it goes. Cheers, -Randy




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux