On Thu, 14 Jan 2016, Esteban Hernández wrote: > On Thu, Jan 14, 2016 at 4:35 PM, Esteban Hernández <eshernan@xxxxxxxxx> wrote: > > Dear alexander, > > > > I review the code of pi implementation and the pi value is copyout > > > > > > #pragma acc data copyout (pi) > > #pragma acc parallel vector_length (vl) reduction (+:pi) > > for (i=0; i<N; i++) { > > double t= (double)((i+0.5)/N); > > pi +=4.0/(1.0+t*t); > > } > > printf("pi=%11.10f\n",pi/N); > > > > But when i run the program with strace the result is wattling forever, It's probably just takes a lot of time (it's running only with 1 worker&gang), try decreasing N or add num_workers/num_gangs clauses in addition to vector_length. (I'm not sure if OpenACC requires running with 1 worker and gang in your example, or GCC is behaving suboptimally -- please wait for comment from OpenACC implementors in GCC) Alexander