gcc-6.3.0 support for OpenMP device pragmas

Siegmar Gross <siegmar.gross@xxxxxxxxxxxxxxxxxxxxxx> · Tue, 14 Feb 2017 14:25:05 +0100

Hi,

I have installed gcc-6.3.0 with support for OpenACC and OpenMP for
NVIDIA Quadro K2200 on my "SUSE Linux Enterprise Server 12.2 (x86_64)".
We use CUDA Toolkit 8.0. I've implemented a small program to compute
a dot product with OpenMP device pragmas. Unfortunately, my program
doesn't recognize my NVIDIA GPU.

loki OpenACC 272 gcc -fopenmp dot_prod_accel_OpenMP.c
loki OpenACC 273 a.out
Number of processors:     24
Number of devices:        0
Default device:           0
Execution on host device: yes
sum = 6.000000e+08
loki OpenACC 274

I used the following command to configure "gcc".

loki OpenACC 214 gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-6.3.0_accel/libexec/gcc/x86_64-pc-linux-gnu/6.3.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-6.3.0/configure --prefix=/usr/local/gcc-6.3.0_accel 
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu 
--target=x86_64-pc-linux-gnu 
--enable-offload-targets=nvptx-none=/usr/local/gcc-6.3.0_accel/bin 
--with-cuda-driver=/usr/local/cuda/ --enable-languages=c,c++,fortran,lto 
--enable-nls --enable-threads=posix --with-gmp-lib=/usr/local/lib64 
--with-gmp-include=/usr/local/include --with-mpfr-lib=/usr/local/lib64 
--with-mpfr-include=/usr/local/include --with-mpc-lib=/usr/local/lib64 
--with-mpc-include=/usr/local/include --with-isl-lib=/usr/local/lib64 
--with-isl-include=/usr/local/include
Thread model: posix
gcc version 6.3.0 (GCC)
loki OpenACC 215

Is something wrong with my program, do I need different/more command
line options, or doesn't gcc support offloading to a NVIDIA GPU with
OpenMP device pragmas? Thank you very much for any help in advance.

Kind regards

Siegmar
/* gcc -fopenmp -o dot_prod_accel_OpenMP_gcc dot_prod_accel_OpenMP.c
 *
 */

#include <stdio.h>
#include <stdlib.h>
#ifdef _OPENMP
  #include <omp.h>
#endif

#define VECTOR_SIZE 100000000		/* vector size (10^8)		*/

/* heap memory to avoid a segmentation fault due to a stack overflow	*/
double a[VECTOR_SIZE],			/* vectors for dot product	*/
       b[VECTOR_SIZE];

int main (void)
{
  double sum;

  /* initialize vectors							*/
  #pragma omp target map (from: a, b)
  #pragma omp parallel for default(none) shared(a, b)
  for (int i = 0; i < VECTOR_SIZE; ++i)
  {
    a[i] = 2.0;
    b[i] = 3.0;
  }

  #ifdef _OPENMP
    printf ("Number of processors:     %d\n"
	    "Number of devices:        %d\n"
	    "Default device:           %d\n"
	    "Execution on host device: %s\n",
	    omp_get_num_procs (), omp_get_num_devices (),
	    omp_get_default_device (),
	    (omp_is_initial_device ()) ? "yes" : "no");
  #endif

  /* compute dot product						*/
  sum = 0.0;
  #pragma omp target map(to:a,b), map(tofrom:sum)
  #pragma omp parallel for default(none) shared(a, b) reduction(+:sum)
  for (int i = 0; i < VECTOR_SIZE; ++i)
  {
    sum += a[i] * b[i];
  }
  printf ("sum = %e\n", sum);
  return EXIT_SUCCESS;
}