Anand Patil wrote: > Tim, > >> Do you expect your gfortran to interchange the loops? Does it do it? >> Was this excerpt designed to prevent your current platform, whatever it >> may be, suffer in comparison with some other? > > You're giving me way too much credit. :) I was just trying to figure > out OpenMP wasn't giving me any speedup over the serial version, but > in fact it was. > > Re: your question, I exchanged the loops manually and got much better > performance, so I'm guessing gfortran didn't exchange them. > The slogan from about 20 years back "concurrent [threaded parallel] outer, vector inner, still applies. Vectorization, particularly on Intel or AMD processors, is facilitated by stride 1 inner loops, which also take full advantage of the cache, hardware prefetch, and read/write combine buffering of most current CPUs. I guess you figured out that OpenMP normally involves increased total CPU time, if you add up all the threads. The OpenMP standard provides a timer function, omp_get_wtime, which relates to elapsed time. It's usually a wrapper for one of the system functions, likely with better resolution than cpu_time.