Re: Possible performance issue with gfortran? denormalized numbers

Tim Prince <n8tm@xxxxxxx> · Tue, 2 Feb 2016 16:39:05 -0500



On 2/2/2016 3:33 PM, Toon Moene wrote:
> On 02/02/2016 09:15 PM, Tim Prince wrote:
>
>> On 2/2/2016 2:30 PM, Toon Moene wrote:
>
>>> On 02/01/2016 12:55 PM, Jose Miguel Reynolds Barredo wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I was developing a tridiagonal block solver and I found a performance
>>>> issue that is ingriguing me: doing operations with numbers in the
>>>> denormalized range is around ten times slower than regular numbers
>>>> (https://en.wikipedia.org/wiki/Denormal_number). As an example, I
>>>> builded a very simple code:
>>>
>>> The reason operations with denormal numbers are ten times slower has -
>>> unfortunately, as otherwise we could do something about it - nothing to
>>> do with the compiler or the run time library (of *any* language).
>>>
>>> Denormal number operations are handled by the operating system, because
>>> it is too costly to allocate silicon to handle them on the CPU. So when
>>> the CPU detects a denormal number, it traps. This trap is caught by the
>>> OS, which dispatches the computation to a routine written for the
>>> purpose. The trap and the software implementation of the handling of
>>> the
>>> operation involving a denormal are costly, as you observed.
>>>
>>> There is nothing the compiler writers (*any* compiler writers, not just
>>> GCC's) can do about this.
>>>
>> You seem to have been unwilling to spend an additional minute reading
>> the remainder of those Wikipedia posts.
>> gcc/gfortran bury the initialization of the CPU to abrupt underflow in
>> the -ffast-math option.  ifort makes it a default.
>
> That is a deliberate choice. GCC (the whole compiler collection,
> independent of language) will generate code *by default* that adheres
> to the IEEE 754 standard as far as we are able.
I have no doubt that defaulting to gradual underflow is a good decision
(and might be made by ifort if it were not for the historical factor),
but gfortran fails some testsuite cases on win64, as gfortran simply
takes the setting from the OS.
>
>> This removes the
>> performance problem, with the consequence of inaccurate results when
>> underflow is involved.
>
> It is the *user's* choice to go for force-underflow-to-zero. Note that
> this might also mean that code that performs normally without this
> option, might get a floating point exception because of division by
> zero if the force-underflow-to-zero option is enabled; therefore, it
> *has* to be the user's choice, because it depends on the behavior of
> the algorithm involved.
The only Fortran standard way to choose the mode, over-riding the choice
made by the OS, is in the IEEE_arithmetic module.  In some compilers
(including gfortran?) IEEE_arithmetic isn't supported in general under
options which permit optimizations such as simd sum reduction, and no
one is willing to tell us which IEEE_arithmetic options should work
then.  If IEEE_arithmetic facility is a satisfactory way to do it, the
corresponding testsuite cases should use it.
>
>> Intel CPUs released from Sandy Bridge on have eliminated the penalty for
>> addition/subtraction involving subnormals, largely on account of the
>> widespread use of gradual underflow with gcc.  A significant penalty for
>> multiplication remains.
>
> Note that the underflow might also be a problem in a (math) library
> routine, which - if supplied with the operating system - one has no
> control over.
Math library functions which set their own underflow mode must capture
the mode set on entrance and restore it on exit.
in fact, it may be possible to break  Microsoft Visual Studio math
libraries by setting gradual underflow, as the calling application is
expected to set abrupt underflow in 32-bit mode, where the OS does that
for 64-bit mode.
>
>> One might question whether a CPU from over 5
>> years ago need be your primary target for new project development.
>> SPARC CPUs historically had a similar problem with underflow.
>
> The SPARC operating system blew up the problem by having a counter in
> every routine of the operating system (which was a 64 bit counter,
> horrible to update in a 32-bit OS).
>
>> Intel Itanium CPUs had an extremely severe problem with partial
>> underflow as well as with true underflow, so operation in gradual
>> underflow mode was impractical.
>
> Don't tell me about it - we used an SGI Altix for 5 years ...
>
> Good points, though - thanks,
>

-- 
Tim Prince