Hi everyone, I was developing a tridiagonal block solver and I found a performance issue that is ingriguing me: doing operations with numbers in the denormalized range is around ten times slower than regular numbers (https://en.wikipedia.org/wiki/Denormal_number). As an example, I builded a very simple code: program test implicit none integer :: j real :: m1(10000000),m2(10000000) do j=1,10 m1=1.9123012391238E-39 !if the exponent is changed to -37, the code is ten times faster m2=1.2903458938459E0*m1 enddo print *,'stop',m1(10000),m2(2323) end program test First, some comments about the code: - The vectors has to be long enough not to fit into the cache. - The long numbers values are random, just to make sure that the compiler optimization is doing something extrange. - The last printed value is to force the compiler optimizer to run the code. - The code is compiled with "gfortran -O2" option (version 4.4.7). If we run this example with simple precision, the code takes several seconds, but if we change the value 1.91230123912389E-39 to 1.91230123912389E-38, the code takes ten times less time. The issue has to do with being in the denormalized range. If I run the code with ifort, the case with 1.91230123912389E-39 gives you the next warning: test.f90(14): remark #7920: The value was too small when converting to REAL(KIND=4); the result is in the denormalized range. [1.91230123912389E-39] m1=1.91230123912389E-39!321 for double precision but the code run as fast as the case with E-37. Some extra tips: - The issue does not appear if all calculations are inside the CPU or the cache: if the vector size is smaller, no problem appear, this is because the vector has to be long enough. - The same issue appears on double precission at exponent around 307, just in the limit of denormalized range. - I run in several platforms (all intel based but different processor models) and issue continues. May be you say that no code run on those small numbers, but in my case, a tridiagonal solver acts in some set of variables as an attractor to that range, and never goes to real zero. In fact, the attractor can be simplied to this: aux1=-9*m1(j-1)+70*m2(j-1) aux2=-9*m2(j-1) m1(j)=( 7*aux1-54*aux2)/4790 m2(j)=(77*aux1+69*aux2)/4790 (in this case, values are not critical, but importants). a code iterating this converges to numbers in the denormalized range, and stay there. I have fixed the issue just setting to 0.0E0 the values smaller than 1.0E-40 for example, that for my requirements is more than enough, but it would be great to find a better solution for this! Thank you for any help! JM