Thanks for the information. I think I understand what you said about
the line number. Less so, what you said about statements comparing as
equal. I don't understand how that might fit into the process of
optimizing.
The problem isn't even consistent with perfectly identical (massive)
source code. Checking build logs, I found the same error happened twice
out of several times the identical source code was compiled. Rerunning
outside of the build system (but same source code and g++ command) it
didn't happen again.
Do any of the optimizations make decisions based on the available
physical ram of the system on which the compile is running?
The system has 8GB of physical ram plus plenty of swap space. On the
two occasions when it failed, it was one of 8 large compiles running.
Under such conditions almost all the 8GB is used by the compile
processes, memory for caching is pushed down to just a few hundred MB
and a moderate amount of swap space is used. (That is a non optimal
condition. Running fewer compiles at once would be a little faster to
get them all done. But our build process can't factor size in when
deciding how many compilers to run in parallel. Most compiles take less
memory so 8 way parallel is optimal on that system).
I'm just mentioning the performance issues, not directly asking about
them. What I'm really asking about is whether the memory pressure
affects the optimization decisions rather than just the performance.
There are some monster large functions in which optimization could take
unreasonable amounts of time and/or memory (the function ending right
before the reported line is one of those). If external (from other
processes) memory pressure affects optimizer choices, that could explain
why this failure happened only twice out of several compiles of the same
source code. If the compiler is supposed to be more deterministic, it
may be much harder to explain.
I'm used to non deterministic behavior from the Intel10 compiler in
Windows run 8 copies in parallel (on better processors with 16GB of
ram). It's easier to run a cleanup (retry) pass for the failures at the
end of the build script than to solve them or to run fewer in parallel.
I guess now I need the same for gcc and Linux. When I wrote the first
email about this I didn't yet realize the builds of that source that had
worked were from perfectly identical source code to the two that
failed. That's bad news for any hope f solving it (though I'm open to
suggestions) but good news for my build process. I don't really need to
solve it, just run a clean up pass for the failure(s). (So far just
one of MANY files in the build process failed and it probably fails at
low probability).
Ian Lance Taylor wrote:
The line number is
meaningless except to indicate the function--the error could be
triggered by any of the code in that function or in functions that it
calls which have been inlined.
The specific problem seems to be that two statements compared as equal
but had a different hash code. I have no idea what caused that--it
could even be due to memory corruption elsewhere in the compiler.
This probably won't help much, but I expect that you can avoid this by
compiling without optimization.