On 11/7/2011 1:32 PM, Francisco Llaryora wrote:
I want to tell you about a bizarre behavior in executables compiled
with gcc 4.2.1 compiler.
A few weeks ago i did must to paralelize a lattice boltzmann algorithm
using OMP directives (with adding own optimizations) to pass my
High-performance computing course.
I compile my c programm like this:
$gcc -O3 -fopenmp -lm lattice-boltzmann.c -o output
In lattice-boltzmann.c code i get the time with 2 calls to :omp_get_wtime().
And then run the output after set OMP_NUM_THREADS value:
$export OMP_NUM_THREADS=XXX
$./output #without taskset programm
With the purpose of measuring the SpedUp by changing the number of threads.
I did run fourty times by changing the value OMP_NUM_THREAD from 1 to 40.
I run it in a node with 40 cores Xenon.4 Processors with 10 cores each one.
The next is time in sec, no speedUP:
3.972002
2.052636
1.430107
1.111162
0.938760
0.811720
0.719483
0.666361
0.621093
0.563764
0.546574
0.510918
0.497733
0.481643
0.476792
0.454967
0.476283
0.445797
0.433787
0.426813
0.432890
0.436993
0.401258
0.424988
0.409322
0.416022
0.475070
0.425502
0.414787
0.434697
0.450460
0.428303
0.452609
0.453830
0.450324
0.461843
0.466831
0.464153
0.761500//39 threads
29.927848//40 threads
Why performance decreased when the number of threads approaches the
number of cores?
What version of gcc resolve this behavior? Or How i resolve this behavior?
The problem is not in the decreased of performance is in the magnitude
with which it does.
I compile my c programm like this too (intel64 compiler ver 11.1):
$icc -fast -openmp lattice-boltzmann.c -o output
The performance also decreases but not in the same magnitude.
0.336723//39 threads
0.756676//40 threads, and not: 14.22 sec for example.
redirecting to gcc-help.
And how did it do with taskset or GOMP_OMP_AFFINITY settings, perhaps
with a more up to date libgomp?
Do you have another job running?
--
Tim Prince