[OpenACC] libgomp: cuMemHostGetDevicePointer error: invalid device

Νίκος Αντωνιάδης <annikos@xxxxxxxxx> · Mon, 12 Oct 2015 20:27:32 +0300

I've downloaded and compiled the latest gcc:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/home/nikos/gcc-test/install/bin/../libexec/gcc/x86_64-unknown-linux-gnu/5.2.1/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-unknown-linux-gnu
Configured with: /home/nikos/gcc-test/source/gcc/configure --prefix= 
--disable-bootstrap --enable-languages=c,c++,fortran,lto 
--disable-multilib 
--enable-offload-targets=nvptx-none=/home/nikos/gcc-test/install 
--with-cuda-driver-include=/usr/local/cuda-6.5/include CC='gcc -m64' 
CXX='g++ -m64' --with-sysroot=
Thread model: posix
gcc version 5.2.1 20151006 (GCC)

using the instructions from 
http://scelementary.com/2015/04/25/openacc-in-gcc.html
in order to enable the OpenACC support in gcc.

My machine is (as screenfetch appears it so):
OS: Antergos
Kernel: x86_64 Linux 4.2.2-1-ARCH
Uptime: 2h 50m
Packages: 912
Shell: bash 4.3.42
Resolution: 1920x1200
WM: GNOME Shell
WM Theme: Numix-Frost-Light
CPU: Intel Core i7 CPU 920 @ 2.793GHz
GPU: GeForce GTX 275
RAM: 2053MiB / 5971MiB

Compile was successful as well as compiling the first test program (as 
it appears in the above site):

|#include <stdio.h> #define N 2000000000 #define vl 1024 int main(void) { 
double pi = 0.0f; long long i; #pragma acc parallel vector_length(vl) 
#pragma acc loop reduction(+:pi) for (i=0; i<N; i++) { double t= 
(double)((i+0.5)/N); pi +=4.0/(1.0+t*t); } printf("pi=%11.10f\n",pi/N); 
return 0; }|

compiled with:

|gcc pi.c -fopenacc -foffload=nvptx-none -foffload="-O3" -O3 -o gpu.x |

When I've run:

time ./gpu.x**

the result was:

*libgomp: cuMemHostGetDevicePointer error: invalid device**
*
real    0m0.063s
user    0m0.013s
sys    0m0.047s

I would like to mention that CUDA is working well on my machine.
Is it a misconfiguration/miscompilation of myself while compiling gcc or 
a bug in libgomp?
Thank you in advance.