Re: gfortran 4.2 with openMP: why no speedup?

"Anand Patil" <anand.prabhakar.patil@xxxxxxxxx> · Sat, 12 Jan 2008 14:34:28 -0800

Nelson,

Thanks for your advice. I just figured out that the perceived lack of
a speedup was illusory: I was looking at the CPU time rather than the
wall-clock time, so that resolved my primary concern, but...

> (1) Fortran arrays are stored with the first subscript increasing most
> rapidly, the opposite of that used for C and C++.  Reversing the loop
> order will make better use of cache.

This made a huge difference whether using OpenMP or not, thanks!

> (2) The second problem is the dimensions ("I've set nx and ny so large
> (1000 and 5000...").  To avoid cache conflicts, you want to choose the
> number of rows to be something other than a power of 2: a prime number
> is often a good choice.  I have an example in my files of a program
> that ran about 3 times faster just by changing a row dimension from
> 256 (where there were cache collisions along the row) to 257 (where
> cache collisions are rare).

I would NEVER have figured this out, thanks. In the current
application the problem dictates the sizes of my arrays, so I can't
really use the tip, but I'll keep it in mind in the future.

> You should also check the generated assembly code (f77 -S foo.f)
> whether C(i,j)**2 is compiled into the inline code C(i,j)*C(i,j), or
> into call to the run-time library power function, and also whether the
> subscript address computations are eliminated.

I'll just inline it manually to be sure. I was trying to get a speedup
from openMP in that subroutine, not necessarily optimize overall.

Anand