2011/12/20 Vincent Lefevre <vincent+gcc@xxxxxxxxxx>: > On 2011-12-20 10:34:35 +0000, Jonathan Wakely wrote: >> On 20 December 2011 10:20, Ico wrote: >> > >> > Still, I'm not sure if sse is part of the problem and/or solution. >> >> It's the solution. >> >> > I have been reducing the program to see what the smallest code is >> > that still shows this behaviour. Latest version is below. >> > >> > >> > $ gcc -msse -mfpmath=sse -O3 -march=native test.c >> >> What is "native" for your system, i686? (also, what does gcc >> -dumpmachine show?) i686 doesn't support SSE, you need at least >> pentium3. > > I can reproduce the "problem" on an x86_64 machine, so it is not > due to the traditional FPU. I just think that the multiplication > by 0 is faster (because much easier than the generic case), as > I've said in my other message. But to have such an optimization, > I wouldn't complain. :) > > -- > Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <http://www.vinc17.net/> > 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> > Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon) The problem doesn't manifest when the hardware mode flush-to-zero (FTZ) is enabled. This flag causes the hardware to round all denormal values produced by an operation to zero. In the second program, if 0.5 < f < 1 the values of a and b eventually become the smallest representable denormal value and never change afterwards, resulting in a large number of operations involving denormal numbers. When f <= 0.5, in the default rounding mode, when a is the smallest representable number the result of (a * f) is zero. Therefore denormal numbers are produced only a small number of times. gcc enables FTZ when using SSE and ffast-math (I think the specific compiler flag is -funsafe-math-optimizations). Therefore the flags needed are -msse2 -mfpmath=sse -ffast-math Dario