Re: gcc 3.3.5 valid floating point operations randomly produce NaN

"Lionel B" <lionelbuk@xxxxxxxxxxx> · Wed, 8 Jun 2005 11:59:43 +0100

Joseph Bebel wrote:
> Hello,
>
> I am currently using gcc 3.3.5 (on gentoo gnu/linux on athlonxp arch)
> to compile a sizable, computationally intensive but not otherwise
> large program (~6000 lines) of Gnu99 C code (-std=gnu99), and in some
> functions in particular it appears that otherwise valid floating point
> operations return NaN instead of the desired computation.

Hi Joseph,

>From your description of the problem, everything points to this being not a floating point issue, but rather Undefined
Behaviour. My first suspicion would be dodgy memory management (buffer overrun, mis-allocation, ...); i.e. that some
part of your program is stomping on some area of memory it shouldn't be...

Pointers (pun fully intended!) for suspecting this to be the case include:

1) The "random" nature of the problem

2) Inability to replicate in a simple program

3) Inconsistent behaviour with gdb

4) Dependency on compiler options

To put it simply: I suspect you have a bug in your code.

> For example, the operation:
>
> long double x = (some valid float value, like 0.22532);
> long double x0 = (some other valid float like 0.6364);
> long double result = x - x0;
> printf("%Lf\n", result);
>
> would sometimes print NaN, sometimes the correct value. If I inspect
> the value of result in the gdb debugger following the computation it
> also says NaN. If I perform the computation in gdb (with command
> "print x - x0") it prints the valid result, which is why this bug is
> so painful.
>
> The program uses a combination of long doubles, doubles, and floats.
> The problem does not seem to distinguish or change  itself by
> selecting a different type. I have been unable to replicate the
> problem however, in a small test program, which makes me suspect the
> size of the program(with many function calls and stack variables) to
> be the culprit, but there is not stack overflow error.
>
> It seems that the location of the problems is random, though mostly
> concentrated on subtraction operations for some reason. However it is
> deterministic (same or similar code causes same problem in same
> location), though it is not the same when changing compiler options.
> (i.e., changing from -g -ggdb to -O2 changes location of problem, not
> fixing it)
>
> If anybody has a theory of what is triggering this behavior, please
> let me know. Also, please include my email in the reply as I am not
> subscribed. Perhaps I need to upgrade the compiler, but first I would
> like to see if it is my error which caused this.
>
> Thank you
> JB
>
> Here is the list of compiler options tried:
> -lpthread -W -Wall -Wfloat-equal -std=gnu99 -g -ggdb -march=i686
> -lpthread -W -Wall -Wfloat-equal -std=gnu99 -g -ggdb -march=athlon-xp
> -lpthread -W -Wall -Wfloat-equal -std=gnu99 -g -ggdb
> -lpthread -W -Wall -Wfloat-equal -std=gnu99 -O2
>
> with various other custom static libraries linked, etc.

-- 
Lionel B