Anand Patil wrote: > > I'm trying to figure out why I'm not getting any speedup at all in the > following Fortran code, even though I've set nx and ny so large (1000 > and 5000) that the computation takes 14-15s and I'm on an 8-core > machine: > > SUBROUTINE testomp(C,nx,ny) > > cf2py threadsafe > cf2py double precision dimension(nx,ny),intent(inplace)::C > cf2py integer intent(hide),depend(C)::nx=shape(C,0) > cf2py integer intent(hide),depend(C)::ny=shape(C,1) > > INTEGER nx,ny,i,j > DOUBLE PRECISION C(nx,ny) > > !$OMP PARALLEL DO > !$OMP& DEFAULT(SHARED) PRIVATE(i,j) > !$OMP& SCHEDULE(STATIC) > do i=1,nx > do j=1,ny > C(i,j) = dexp(-C(i,j)**2) > enddo > enddo > !$OMP END PARALLEL DO NOWAIT > > return > END Do you expect your gfortran to interchange the loops? Does it do it? Was this excerpt designed to prevent your current platform, whatever it may be, suffer in comparison with some other?