Re: gfortran 4.2 with openMP: why no speedup?

Tim Prince <tprince@xxxxxxxxxxxxx> · Sat, 12 Jan 2008 14:54:39 -0800

Anand Patil wrote:
> Tim,
> 
>> Do you expect your gfortran to interchange the loops?  Does it do it?
>> Was this excerpt designed to prevent your current platform, whatever it
>> may be, suffer in comparison with some other?
> 
> You're giving me way too much credit. :) I was just trying to figure
> out OpenMP wasn't giving me any speedup over the serial version, but
> in fact it was.
> 
> Re: your question, I exchanged the loops manually  and got much better
> performance, so I'm guessing gfortran didn't exchange them.
> 

The slogan from about 20 years back "concurrent [threaded parallel]
outer, vector inner, still applies.  Vectorization, particularly on
Intel or AMD processors, is facilitated by stride 1 inner loops, which
also take full advantage of the cache, hardware prefetch, and read/write
combine buffering of most current CPUs.

I guess you figured out that OpenMP normally involves increased total
CPU time, if you add up all the threads.  The OpenMP standard provides a
timer function, omp_get_wtime, which relates to elapsed time.  It's
usually a wrapper for one of the system functions, likely with better
resolution than cpu_time.