Alexander, I discovered the problem, In my code i use #pragma acc kernels loop gang(100) vector (256) but activate GOMP_DEBUG1, the result of my execution is nvptx_exec: prepare mappings nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, workers=1, vectors=32 the numbers of gangs is different. Who i send the correct number of gangs to nvptx_exec ? On Fri, Jan 15, 2016 at 7:22 PM, Esteban Hernández <eshernan@xxxxxxxxx> wrote: > Poor performance, > > Dear Alexander, > > My pi example running now, but the processing time is very slow. > > for example if a run the executable compiled with gcc the time is > > real 0m7.626s > user 0m5.574s > sys 0m2.007s > > > But if i compile the same code with pgi and get the time the result is > > pi=3.1415926536 > > real 0m0.269s > user 0m0.008s > sys 0m0.208s > > > Can you help me explain what parameter of FLAGS i need used to review > the performance problem? > > > Thanks again > > On Thu, Jan 14, 2016 at 4:51 PM, Alexander Monakov <amonakov@xxxxxxxxx> wrote: >> On Thu, 14 Jan 2016, Esteban Hernández wrote: >> >>> On Thu, Jan 14, 2016 at 4:35 PM, Esteban Hernández <eshernan@xxxxxxxxx> wrote: >>> > Dear alexander, >>> > >>> > I review the code of pi implementation and the pi value is copyout >>> > >>> > >>> > #pragma acc data copyout (pi) >>> > #pragma acc parallel vector_length (vl) reduction (+:pi) >>> > for (i=0; i<N; i++) { >>> > double t= (double)((i+0.5)/N); >>> > pi +=4.0/(1.0+t*t); >>> > } >>> > printf("pi=%11.10f\n",pi/N); >>> > >>> > But when i run the program with strace the result is wattling forever, >> >> It's probably just takes a lot of time (it's running only with 1 worker&gang), >> try decreasing N or add num_workers/num_gangs clauses in addition to >> vector_length. >> >> (I'm not sure if OpenACC requires running with 1 worker and gang in your >> example, or GCC is behaving suboptimally -- please wait for comment from >> OpenACC implementors in GCC) >> >> Alexander > > > > -- > Sincerely > > > Esteban Hernandez B. > HPC specialist -- Sincerely Esteban Hernandez B. HPC specialist