openMP gcc vs icc, erratic results with gcc

"diego sandoval" <diego.sandoval@xxxxxxxxxx> · Wed, 21 May 2008 02:34:36 -0500

 Hi everybody,
I just started working with openMP,  i installed first gcc-4.2.3 and
then gcc-4.3.0,  both of them having  support for openMP.
I tried a code to calculate the product \pi*\e.  When i compile  the
code with gcc (both 4.2.3 and 4.3.0) withtout -fopenmp the result is
correct. When i try with the -fopenmp option the result is erroneous.
I also tried with the intel compiler icc  (with -openmp) in order to
verify the code correctness . There was no problem. I dont know what
is wrong with gcc and this particular code but the results are
erratic. If anyone of you can help me ... thanks in advance.

Let me ellaborate on this problem.

I am using gcc-4.3.0  in slackware 12.0 vanilla,  i have a quad core smp machine

$ uname -a
Linux ra 2.6.24.3-smp #1 SMP Wed Feb 27 18:46:56 COT 2008 i686
Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz GenuineIntel GNU/Linux

$ gcc-4.3.0 -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ./configure
--prefix=/home/medrano/compilers/gcc-4.3.0/ --enable-shared
--enable-languages=c,c++ --enable-threads=posix --enable-__cxa_atexit
--disable-checking --with-gnu-ld --verbose --program-suffix=-4.3.0
Thread model: posix
gcc version 4.3.0 (GCC)

I tried the  openMP code below from
http://www.kallipolis.com/openmp/taylor_mp.c   which is supposed to
calculate the product \pi*\e using the taylor series.

$ gcc-4.3.0 -O2 -fopenmp taylor_mp.c -o taylor.gcc.out
$ icc -O2 -openmp taylor_mp.c -o taylor.intel.out

The results are:

$ ./taylor.gcc.out
Reached result 5.142145 in 10640.000 seconds ### wrong result
$ ./taylor.gcc.out
Reached result 10.795894 in 10660.000 seconds ### wrong result

$ ./taylor.intel.out
Reached result 8.539734 in 9950.000 seconds ### right result
$ ./taylor.intel.out
Reached result 8.539734 in 9570.000 seconds  ### right result

/*
 * taylor.c
 *
 * This program calculates the value of e*pi by first calculating e
 * and pi by their taylor expansions and then multiplying them
 * together.
 */

#include <stdio.h>

#include <time.h>

#define num_steps 20000000

int main(int argc, char *argv[])
{
  double start, stop; /* times of beginning and end of procedure */
  double e, pi, factorial, product;
  int i;

  /* start the timer */
  start = clock();

  /* Now there is no first and seccond, we calculate e and pi */
#pragma omp parallel sections shared(e, pi)
  {
#pragma omp section
    {
      printf("e started\n");

      e = 1;
      factorial = 1; /* rather than recalculating the factorial from
			scratch each iteration we keep it in this varialbe
			and multiply it by i each iteration. */
      for (i = 1; i<num_steps; i++) {

	factorial *= i;
	e += 1.0/factorial;
      }
      printf("e done\n");
    } /* e section */

#pragma omp section
    {
      /* In this thread we calculate pi expansion */
      printf("pi started\n");

      pi = 0;
      for (i = 0; i < num_steps*10; i++) {
	/* we want 1/1 - 1/3 + 1/5 - 1/7 etc.
	   therefore we count by fours (0, 4, 8, 12...) and take
             1/(0+1) =  1/1
	   - 1/(0+3) = -1/3

             1/(4+1) =  1/5
	   - 1/(4+3) = -1/7 and so on */
	pi += 1.0/(i*4.0 + 1.0);
	pi -= 1.0/(i*4.0 + 3.0);
      }
      pi = pi * 4.0;
      printf("pi done\n");
    } /* pi section */

  } /* omp sections */
  /* at this point the threads should rejoin */

  product = e * pi;

  stop = clock();

  printf("Reached result %f in %.3f seconds\n", product, (stop-start)/1000);

  return 0;
}