Joseph Bebel wrote: > Hello, > > I am currently using gcc 3.3.5 (on gentoo gnu/linux on athlonxp arch) > to compile a sizable, computationally intensive but not otherwise > large program (~6000 lines) of Gnu99 C code (-std=gnu99), and in some > functions in particular it appears that otherwise valid floating point > operations return NaN instead of the desired computation. Hi Joseph, >From your description of the problem, everything points to this being not a floating point issue, but rather Undefined Behaviour. My first suspicion would be dodgy memory management (buffer overrun, mis-allocation, ...); i.e. that some part of your program is stomping on some area of memory it shouldn't be... Pointers (pun fully intended!) for suspecting this to be the case include: 1) The "random" nature of the problem 2) Inability to replicate in a simple program 3) Inconsistent behaviour with gdb 4) Dependency on compiler options To put it simply: I suspect you have a bug in your code. > For example, the operation: > > long double x = (some valid float value, like 0.22532); > long double x0 = (some other valid float like 0.6364); > long double result = x - x0; > printf("%Lf\n", result); > > would sometimes print NaN, sometimes the correct value. If I inspect > the value of result in the gdb debugger following the computation it > also says NaN. If I perform the computation in gdb (with command > "print x - x0") it prints the valid result, which is why this bug is > so painful. > > The program uses a combination of long doubles, doubles, and floats. > The problem does not seem to distinguish or change itself by > selecting a different type. I have been unable to replicate the > problem however, in a small test program, which makes me suspect the > size of the program(with many function calls and stack variables) to > be the culprit, but there is not stack overflow error. > > It seems that the location of the problems is random, though mostly > concentrated on subtraction operations for some reason. However it is > deterministic (same or similar code causes same problem in same > location), though it is not the same when changing compiler options. > (i.e., changing from -g -ggdb to -O2 changes location of problem, not > fixing it) > > If anybody has a theory of what is triggering this behavior, please > let me know. Also, please include my email in the reply as I am not > subscribed. Perhaps I need to upgrade the compiler, but first I would > like to see if it is my error which caused this. > > Thank you > JB > > Here is the list of compiler options tried: > -lpthread -W -Wall -Wfloat-equal -std=gnu99 -g -ggdb -march=i686 > -lpthread -W -Wall -Wfloat-equal -std=gnu99 -g -ggdb -march=athlon-xp > -lpthread -W -Wall -Wfloat-equal -std=gnu99 -g -ggdb > -lpthread -W -Wall -Wfloat-equal -std=gnu99 -O2 > > with various other custom static libraries linked, etc. -- Lionel B