Re: Floating point performance issue

Dario Saccavino <kathoum@xxxxxxxxx> · Tue, 20 Dec 2011 12:48:53 +0100

2011/12/20 Vincent Lefevre <vincent+gcc@xxxxxxxxxx>:
> On 2011-12-20 10:34:35 +0000, Jonathan Wakely wrote:
>> On 20 December 2011 10:20, Ico wrote:
>> >
>> > Still, I'm not sure if sse is part of the problem and/or solution.
>>
>> It's the solution.
>>
>> > I have been reducing the program to see what the smallest code is
>> > that still shows this behaviour. Latest version is below.
>> >
>> >
>> > $ gcc -msse -mfpmath=sse -O3 -march=native test.c
>>
>> What is "native" for your system, i686? (also, what does gcc
>> -dumpmachine show?) i686 doesn't support SSE, you need at least
>> pentium3.
>
> I can reproduce the "problem" on an x86_64 machine, so it is not
> due to the traditional FPU. I just think that the multiplication
> by 0 is faster (because much easier than the generic case), as
> I've said in my other message. But to have such an optimization,
> I wouldn't complain. :)
>
> --
> Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <http://www.vinc17.net/>
> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
> Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)

The problem doesn't manifest when the hardware mode flush-to-zero
(FTZ) is enabled. This flag causes the hardware to round all denormal
values produced by an operation to zero.

In the second program, if 0.5 < f < 1 the values of a and b eventually
become the smallest representable denormal value and never change
afterwards, resulting in a large number of operations involving
denormal numbers.
When f <= 0.5, in the default rounding mode, when a is the smallest
representable number the result of (a * f) is zero. Therefore denormal
numbers are produced only a small number of times.

gcc enables FTZ when using SSE and ffast-math (I think the specific
compiler flag is -funsafe-math-optimizations).
Therefore the flags needed are -msse2 -mfpmath=sse -ffast-math

    Dario