Nvptx GPU offloading using OpenMP4 and GCC 7.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, 

I have been trying to install the GCC 7.2 compiler with offload capabilities for nvptx, but so far, unsuccesful. 

I mainly based myself on https://gcc.gnu.org/wiki/Offloading and https://kristerw.blogspot.nl/2017/04/building-gcc-with-support-for-nvidia.html. The script I finally used for compilation is attached (compileScript.sh) - for the sake of understanding the script, note that the module load cuda/8.0.61 sets the CUDA_PATH variable, you can ignore the #SBATCH lines, which are for compilation from a batch job. 

I've managed to compile the nvptx-tools, GCC-nvptx and GCC-host compilers are without errors. However, when I compile a minimal example of a for loop distributed using an openMP4 "#pragma omp target" statement (gcc -fopenmp -o openMP_GPU_minimal openMP_GPU_minimal.c), the compiler returns the following error: 

gcc: warning: '-x lto' after last input file has no effect 
gcc: fatal error: no input files 

Attached you'll find the compiler output I get with the -v option, it may be more informative than the rather vague warning above. 

I found this thread https://gcc.gnu.org/ml/gcc-help/2016-04/msg00111.html that deals with the exactly the same issue, but the suggestion to install everything (nvptx-tools, gcc-host and gcc-accelerator compilers) in the same <something>/install directory didn't help me: I already did that to begin with, as it is suggested by the script of kristerw. 

Another thing I noticed is that if I add the -flto option, the compilation completes without errors. However, when I then run omp_is_initial_device() inside the #pragma omp target region, it returns 'true', indicating that the code is running on the host device, and not on the accelerator (GPU), as intended. Note that omp_get_num_devices() correctly returns 2 (there are 2 GPUs in the system), but I don't think this tells me anything regarding if I can succesfully offload code: I believe omp_get_num_devices() is just host code, defined in the libgomp.so. So at best, it tells me that I'm using a libgomp.so that supports detecting these accelerators. 

For the sake of completeness, let me also include the output of gcc -v for the host and accelerator compilers, so you can check if that makes sense. 

For the host compiler (gcc or x86_64-pc-linux-gnu-gcc, both return the same): 
Using built-in specs. 
COLLECT_GCC=/home/casparl/GCC_with_nvptx/work/install/bin/gcc 
COLLECT_LTO_WRAPPER=/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../libexec/gcc/x86_64-pc-linux-gnu/7.2.0/lto-wrapper 
OFFLOAD_TARGET_NAMES=nvptx-none 
Target: x86_64-pc-linux-gnu 
Configured with: ../gcc-7.2.0/configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-offload-targets=nvptx-none=/home/casparl/GCC_with_nvptx/work/install --with-cuda-driver-include=/hpc/sw/cuda/8.0.61//include --with-cuda-driver-lib=/hpc/sw/cuda/8.0.61//lib64 --disable-multilib --enable-languages=c,c++,fortran,lto --prefix=/home/casparl/GCC_with_nvptx/work/install 
Thread model: posix 
gcc version 7.2.0 (GCC) 

For the accelerator compiler (x86_64-pc-linux-gnu-accel-nvptx-none-gcc -v): 
Using built-in specs. 
COLLECT_GCC=x86_64-pc-linux-gnu-accel-nvptx-none-gcc 
COLLECT_LTO_WRAPPER=/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../libexec/gcc/x86_64-pc-linux-gnu/7.2.0/accel/nvptx-none/lto-wrapper 
Target: nvptx-none 
Configured with: ../gcc-7.2.0/configure --target=nvptx-none --with-build-time-tools=/home/casparl/GCC_with_nvptx/work/install/nvptx-none/bin --enable-as-accelerator-for=x86_64-pc-linux-gnu --disable-sjlj-exceptions --enable-newlib-io-long-long --disable-multilib --enable-languages=c,c++,fortran,lto --prefix=/home/casparl/GCC_with_nvptx/work/install 
Thread model: single 
gcc version 7.2.0 (GCC) 

I'm afraid I don't have enough insight into what the gcc warning indicates (e.g. if the problem is with the options of my host or accelerator compilers, or with which compilers/linkers are used, etc). Any help to get me going is greatly appreciated, because I've exhausted all potential solutions I could think off (and I'd love to give openMP offloading a try!). 

Cheers, 

Caspar van Leeuwen

Attachment: compileScript.sh
Description: application/shellscript

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../libexec/gcc/x86_64-pc-linux-gnu/7.2.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-7.2.0/configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-offload-targets=nvptx-none=/home/casparl/GCC_with_nvptx/work/install --with-cuda-driver-include=/hpc/sw/cuda/8.0.61//include --with-cuda-driver-lib=/hpc/sw/cuda/8.0.61//lib64 --disable-multilib --enable-languages=c,c++,fortran,lto --prefix=/home/casparl/GCC_with_nvptx/work/install
Thread model: posix
gcc version 7.2.0 (GCC) 
COLLECT_GCC_OPTIONS='-v' '-fopenmp' '-o' 'openMP_GPU_minimal' '-mtune=generic' '-march=x86-64' '-pthread'
 /nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../libexec/gcc/x86_64-pc-linux-gnu/7.2.0/cc1plus -quiet -v -iprefix /nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/ -D_GNU_SOURCE -D_REENTRANT openMP_GPU_minimal.cpp -quiet -dumpbase openMP_GPU_minimal.cpp -mtune=generic -march=x86-64 -auxbase openMP_GPU_minimal -version -fopenmp -o /scratch-local/casparl/ccq7H39A.s
GNU C++14 (GCC) version 7.2.0 (x86_64-pc-linux-gnu)
	compiled by GNU C version 7.2.0, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.16.1-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../x86_64-pc-linux-gnu/include"
ignoring duplicate directory "/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/../../lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../include/c++/7.2.0"
ignoring duplicate directory "/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/../../lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../include/c++/7.2.0/x86_64-pc-linux-gnu"
ignoring duplicate directory "/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/../../lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../include/c++/7.2.0/backward"
ignoring duplicate directory "/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/../../lib/gcc/x86_64-pc-linux-gnu/7.2.0/include"
ignoring duplicate directory "/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/../../lib/gcc/x86_64-pc-linux-gnu/7.2.0/include-fixed"
ignoring nonexistent directory "/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/../../lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../x86_64-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../include/c++/7.2.0
 /nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../include/c++/7.2.0/x86_64-pc-linux-gnu
 /nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../include/c++/7.2.0/backward
 /nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/include
 /nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/include-fixed
 /usr/local/include
 /nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/../../include
 /usr/include
End of search list.
GNU C++14 (GCC) version 7.2.0 (x86_64-pc-linux-gnu)
	compiled by GNU C version 7.2.0, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.16.1-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: e1c6664aa09631308fbc84919a484c03
COLLECT_GCC_OPTIONS='-v' '-fopenmp' '-o' 'openMP_GPU_minimal' '-mtune=generic' '-march=x86-64' '-pthread'
 as -v --64 -o /scratch-local/casparl/ccLp9mZT.o /scratch-local/casparl/ccq7H39A.s
GNU assembler version 2.20.51.0.2 (x86_64-redhat-linux) using BFD version version 2.20.51.0.2-5.47.el6_9.1 20100205
COMPILER_PATH=/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../libexec/gcc/x86_64-pc-linux-gnu/7.2.0/:/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../libexec/gcc/
LIBRARY_PATH=/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/:/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/:/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../:/lib/:/usr/lib/
Reading specs from /nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../lib64/libgomp.spec
COLLECT_GCC_OPTIONS='-v' '-fopenmp' '-o' 'openMP_GPU_minimal' '-mtune=generic' '-march=x86-64' '-pthread'
 /nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../libexec/gcc/x86_64-pc-linux-gnu/7.2.0/collect2 --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o openMP_GPU_minimal /usr/lib/../lib64/crt1.o /usr/lib/../lib64/crti.o /nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/crtbegin.o /nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/crtoffloadbegin.o -L/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0 -L/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc -L/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../.. /scratch-local/casparl/ccLp9mZT.o -lgomp -lgcc --as-needed -lgcc_s --no-as-needed -lpthread -lc -lgcc --as-needed -lgcc_s --no-as-needed /nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/crtend.o /usr/lib/../lib64/crtn.o /nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../lib/gcc/x86_64-pc-linux-gnu/7.2.0/crtoffloadend.o
gcc @/scratch-local/casparl/ccx4OFuQ
gcc: warning: '-x lto' after last input file has no effect
Using built-in specs.
COLLECT_GCC=gcc
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-7.2.0/configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-offload-targets=nvptx-none=/home/casparl/GCC_with_nvptx/work/install --with-cuda-driver-include=/hpc/sw/cuda/8.0.61//include --with-cuda-driver-lib=/hpc/sw/cuda/8.0.61//lib64 --disable-multilib --enable-languages=c,c++,fortran,lto --prefix=/home/casparl/GCC_with_nvptx/work/install
Thread model: posix
gcc version 7.2.0 (GCC) 
/usr/lib/../lib64/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: fatal error: ld returned 1 exit status
#include <omp.h>

void gemm_OpenMP_GPU(float *A, float *B, float *C,
                 const int A_rows, const int A_cols, const int B_rows)
{
  int i, j, k;
  #pragma omp target teams distribute parallel for collapse(2) schedule(static,1) shared(A, B, C) private(i, j, k)
  for (i = 0; i < A_rows; i++)
  {
    for (k=0; k<A_cols; k++)
    {
      for (j = 0; j < B_rows; j++)
	  {
        C[i*B_rows + j] += A[i*A_cols+k] * B[k*B_rows+j];
      }
    }
  }
}


int main(int argc, char **argv)
{
  int A_rows=5;
  int A_cols=5;
  int B_rows=5;
  int B_cols=5;

  double dtime;

  float A[5][5] = { {0,0,1,4,5}, {1,2,7,8,3}, {2,4,1,7,8}, {3,6,2,5,6}, {4,8,7,2,1} };
  float B[5][5] = { {0,0,1,4,5}, {1,2,7,8,3}, {2,4,1,7,8}, {3,6,2,5,6}, {4,8,7,2,1} };
  float C[5][5] = { {0,0,0,0,0}, {0,0,0,0,0}, {0,0,0,0,0}, {0,0,0,0,0}, {0,0,0,0,0} };

  dtime = omp_get_wtime();
  gemm_OpenMP_GPU(&A[0][0], &B[0][0], &C[0][0], A_rows, A_cols, B_cols);
  dtime = omp_get_wtime() - dtime;
//  std::cout << "Time with OpenMp: " << dtime << std::endl;

  return 0;
}



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux