Hi All, I am trying to familiarize myself with openmp and gomp. Just for testing purposes I made some small apps that I compile on a FC6 system which has gcc4.1 with gomp pacthes. I noticed it is almost imposible to get a performance boost. I do notice that things are in different threads when e.g. I parrellize a small loop of sleep(1)'s. However when trying to optimize any of the examples from the several presentations performance gain is negative. Even when making constructions with a big inner loop and a small outerloop. E.g. #include <omp.h> #include <stdio.h> int main () { char buf[10000]; int i, q; int sum = 0; #ifdef OMP omp_set_num_threads (2); #endif for (q = 0; q < 2 ; q++) { #pragma omp parallel for schedule(static,20) for (i = 0; i < 100; i++) { int j; int k = i; int sum = 0; for (j = 0; j < 10000000; j++) sum += j - k; buf[k] = sum; }; }; for (i = 0; i < 100; i++) sum += buf[i]; printf ("%d", sum); return 0; } gcc -fopenmp -O2 -DOMP test2.c -o test2mp ; gcc -O2 test2.c -o test2; time ./test2mp; time ./test2 0 real 0m1.469s user 0m2.700s sys 0m0.000s 0 real 0m1.485s user 0m1.484s sys 0m0.000s I read some threads on cache conflict on L1 and L2 caches, so I made the program not to access the global variables too often, even to take the iterator in a local variable as well. All made no significant difference. Anyone can shed a light on what I am doing wrong? Best regards, Ruud