Hi all, I'm trying to figure out why I'm not getting any speedup at all in the following Fortran code, even though I've set nx and ny so large (1000 and 5000) that the computation takes 14-15s and I'm on an 8-core machine: SUBROUTINE testomp(C,nx,ny) cf2py threadsafe cf2py double precision dimension(nx,ny),intent(inplace)::C cf2py integer intent(hide),depend(C)::nx=shape(C,0) cf2py integer intent(hide),depend(C)::ny=shape(C,1) INTEGER nx,ny,i,j DOUBLE PRECISION C(nx,ny) !$OMP PARALLEL DO !$OMP& DEFAULT(SHARED) PRIVATE(i,j) !$OMP& SCHEDULE(STATIC) do i=1,nx do j=1,ny C(i,j) = dexp(-C(i,j)**2) enddo enddo !$OMP END PARALLEL DO NOWAIT return END The 'cf2py's are directives to f2py, the Python-Fortran interface generator. I'm using gfortran 4.2.1 from Ubuntu Gutsy's apt-get, but because Python needs to dlopen the shared object I applied the patch from bug 28482 ( http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28482 ), recompiled just libgomp, and directed the runtime linker to the new libgomp instead of the old ones. Any help is greatly appreciated. Thanks, Anand Patil