Re: Possible performance issue with gfortran? denormalized numbers

Toon Moene <toon@xxxxxxxxx> · Tue, 2 Feb 2016 21:33:34 +0100

On 02/02/2016 09:15 PM, Tim Prince wrote:

On 2/2/2016 2:30 PM, Toon Moene wrote:

On 02/01/2016 12:55 PM, Jose Miguel Reynolds Barredo wrote:

Hi everyone,

I was developing a tridiagonal block solver and I found a performance
issue that is ingriguing me: doing operations with numbers in the
denormalized range is around ten times slower than regular numbers
(https://en.wikipedia.org/wiki/Denormal_number). As an example, I
builded a very simple code:

The reason operations with denormal numbers are ten times slower has -
unfortunately, as otherwise we could do something about it - nothing to
do with the compiler or the run time library (of *any* language).

Denormal number operations are handled by the operating system, because
it is too costly to allocate silicon to handle them on the CPU. So when
the CPU detects a denormal number, it traps. This trap is caught by the
OS, which dispatches the computation to a routine written for the
purpose. The trap and the software implementation of the handling of the
operation involving a denormal are costly, as you observed.

There is nothing the compiler writers (*any* compiler writers, not just
GCC's) can do about this.

You seem to have been unwilling to spend an additional minute reading
the remainder of those Wikipedia posts.
gcc/gfortran bury the initialization of the CPU to abrupt underflow in
the -ffast-math option.  ifort makes it a default.

That is a deliberate choice. GCC (the whole compiler collection, 
independent of language) will generate code *by default* that adheres to 
the IEEE 754 standard as far as we are able.

This removes the
performance problem, with the consequence of inaccurate results when
underflow is involved.

It is the *user's* choice to go for force-underflow-to-zero. Note that 
this might also mean that code that performs normally without this 
option, might get a floating point exception because of division by zero 
if the force-underflow-to-zero option is enabled; therefore, it *has* to 
be the user's choice, because it depends on the behavior of the 
algorithm involved.

Intel CPUs released from Sandy Bridge on have eliminated the penalty for
addition/subtraction involving subnormals, largely on account of the
widespread use of gradual underflow with gcc.  A significant penalty for
multiplication remains.

Note that the underflow might also be a problem in a (math) library 
routine, which - if supplied with the operating system - one has no 
control over.

One might question whether a CPU from over 5
years ago need be your primary target for new project development.
SPARC CPUs historically had a similar problem with underflow.

The SPARC operating system blew up the problem by having a counter in 
every routine of the operating system (which was a 64 bit counter, 
horrible to update in a 32-bit OS).

Intel Itanium CPUs had an extremely severe problem with partial
underflow as well as with true underflow, so operation in gradual
underflow mode was impractical.

Don't tell me about it - we used an SGI Altix for 5 years ...

Good points, though - thanks,

--
Toon Moene - e-mail: toon@xxxxxxxxx - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news