Possible performance issue with gfortran? denormalized numbers

Jose Miguel Reynolds Barredo <jmrb2002@xxxxxxxxx> · Mon, 1 Feb 2016 12:55:06 +0100

Hi everyone,

I was developing a tridiagonal block solver and I found a performance
issue that is ingriguing me: doing operations with numbers in the
denormalized range is around ten times slower than regular numbers
(https://en.wikipedia.org/wiki/Denormal_number). As an example, I
builded a very simple code:

program test
  implicit none
  integer :: j
  real :: m1(10000000),m2(10000000)

  do j=1,10
      m1=1.9123012391238E-39 !if the exponent is changed to -37, the
code is ten times faster
      m2=1.2903458938459E0*m1
   enddo
   print *,'stop',m1(10000),m2(2323)
end program test

First, some comments about the code:

 - The vectors has to be long enough not to fit into the cache.
 - The long numbers values are random, just to make sure that the
compiler optimization is doing something extrange.
 - The last printed value is to force the compiler optimizer to run the code.
 - The code is compiled with "gfortran -O2" option (version 4.4.7).

If we run this example with simple precision, the code takes several
seconds, but if we change the value 1.91230123912389E-39 to
1.91230123912389E-38, the code takes ten times less time. The issue
has to do with being in the denormalized range. If I run the code with
ifort, the case with 1.91230123912389E-39 gives you the next warning:

test.f90(14): remark #7920: The value was too small when converting to
REAL(KIND=4); the result is in the denormalized range.
[1.91230123912389E-39]
    m1=1.91230123912389E-39!321 for double precision

but the code run as fast as the case with E-37.

Some extra tips:

- The issue does not appear if all calculations are inside the CPU or
the cache: if the vector size is smaller, no problem appear, this is
because the vector has to be long enough.
- The same issue appears on double precission at exponent around 307,
just in the limit of denormalized range.
- I run in several platforms (all intel based but different processor
models) and issue continues.

May be you say that no code run on those small numbers, but in my
case, a tridiagonal solver acts in some set of variables as an
attractor to that range, and never goes to real zero. In fact, the
attractor can be simplied to this:

      aux1=-9*m1(j-1)+70*m2(j-1)
      aux2=-9*m2(j-1)
      m1(j)=( 7*aux1-54*aux2)/4790
      m2(j)=(77*aux1+69*aux2)/4790

(in this case, values are not critical, but importants).

a code iterating this converges to numbers in the denormalized range,
and stay there.

I have fixed the issue just setting to 0.0E0 the values smaller than
1.0E-40 for example, that for my requirements is more than enough, but
it would be great to find a better solution for this! Thank you for
any help!

JM