Hi,
I have installed gcc-6.3.0 with support for OpenACC and OpenMP for
NVIDIA Quadro K2200 on my "SUSE Linux Enterprise Server 12.2 (x86_64)".
We use CUDA Toolkit 8.0. I've implemented a small program to compute
a dot product with OpenMP device pragmas. Unfortunately, my program
doesn't recognize my NVIDIA GPU.
loki OpenACC 272 gcc -fopenmp dot_prod_accel_OpenMP.c
loki OpenACC 273 a.out
Number of processors: 24
Number of devices: 0
Default device: 0
Execution on host device: yes
sum = 6.000000e+08
loki OpenACC 274
I used the following command to configure "gcc".
loki OpenACC 214 gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-6.3.0_accel/libexec/gcc/x86_64-pc-linux-gnu/6.3.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-6.3.0/configure --prefix=/usr/local/gcc-6.3.0_accel
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=x86_64-pc-linux-gnu
--enable-offload-targets=nvptx-none=/usr/local/gcc-6.3.0_accel/bin
--with-cuda-driver=/usr/local/cuda/ --enable-languages=c,c++,fortran,lto
--enable-nls --enable-threads=posix --with-gmp-lib=/usr/local/lib64
--with-gmp-include=/usr/local/include --with-mpfr-lib=/usr/local/lib64
--with-mpfr-include=/usr/local/include --with-mpc-lib=/usr/local/lib64
--with-mpc-include=/usr/local/include --with-isl-lib=/usr/local/lib64
--with-isl-include=/usr/local/include
Thread model: posix
gcc version 6.3.0 (GCC)
loki OpenACC 215
Is something wrong with my program, do I need different/more command
line options, or doesn't gcc support offloading to a NVIDIA GPU with
OpenMP device pragmas? Thank you very much for any help in advance.
Kind regards
Siegmar
/* gcc -fopenmp -o dot_prod_accel_OpenMP_gcc dot_prod_accel_OpenMP.c
*
*/
#include <stdio.h>
#include <stdlib.h>
#ifdef _OPENMP
#include <omp.h>
#endif
#define VECTOR_SIZE 100000000 /* vector size (10^8) */
/* heap memory to avoid a segmentation fault due to a stack overflow */
double a[VECTOR_SIZE], /* vectors for dot product */
b[VECTOR_SIZE];
int main (void)
{
double sum;
/* initialize vectors */
#pragma omp target map (from: a, b)
#pragma omp parallel for default(none) shared(a, b)
for (int i = 0; i < VECTOR_SIZE; ++i)
{
a[i] = 2.0;
b[i] = 3.0;
}
#ifdef _OPENMP
printf ("Number of processors: %d\n"
"Number of devices: %d\n"
"Default device: %d\n"
"Execution on host device: %s\n",
omp_get_num_procs (), omp_get_num_devices (),
omp_get_default_device (),
(omp_is_initial_device ()) ? "yes" : "no");
#endif
/* compute dot product */
sum = 0.0;
#pragma omp target map(to:a,b), map(tofrom:sum)
#pragma omp parallel for default(none) shared(a, b) reduction(+:sum)
for (int i = 0; i < VECTOR_SIZE; ++i)
{
sum += a[i] * b[i];
}
printf ("sum = %e\n", sum);
return EXIT_SUCCESS;
}