Poor performance, Dear Alexander, My pi example running now, but the processing time is very slow. for example if a run the executable compiled with gcc the time is real 0m7.626s user 0m5.574s sys 0m2.007s But if i compile the same code with pgi and get the time the result is pi=3.1415926536 real 0m0.269s user 0m0.008s sys 0m0.208s Can you help me explain what parameter of FLAGS i need used to review the performance problem? Thanks again On Thu, Jan 14, 2016 at 4:51 PM, Alexander Monakov <amonakov@xxxxxxxxx> wrote: > On Thu, 14 Jan 2016, Esteban Hernández wrote: > >> On Thu, Jan 14, 2016 at 4:35 PM, Esteban Hernández <eshernan@xxxxxxxxx> wrote: >> > Dear alexander, >> > >> > I review the code of pi implementation and the pi value is copyout >> > >> > >> > #pragma acc data copyout (pi) >> > #pragma acc parallel vector_length (vl) reduction (+:pi) >> > for (i=0; i<N; i++) { >> > double t= (double)((i+0.5)/N); >> > pi +=4.0/(1.0+t*t); >> > } >> > printf("pi=%11.10f\n",pi/N); >> > >> > But when i run the program with strace the result is wattling forever, > > It's probably just takes a lot of time (it's running only with 1 worker&gang), > try decreasing N or add num_workers/num_gangs clauses in addition to > vector_length. > > (I'm not sure if OpenACC requires running with 1 worker and gang in your > example, or GCC is behaving suboptimally -- please wait for comment from > OpenACC implementors in GCC) > > Alexander -- Sincerely Esteban Hernandez B. HPC specialist