Hi Ruoyao, I see. Thanks very much for your time and detailed explanation! Best Regards Nan Xiao On Tue, Nov 21, 2017 at 4:56 PM, Xi Ruoyao <ryxi@xxxxxxxxxxxxxxxxx> wrote: > On 2017-11-21 16:50 +0800, Xi Ruoyao wrote: >> On 2017-11-21 15:49 +0800, Nan Xiao wrote: >> > >> > #include <omp.h> >> > #include <stdio.h> >> > >> > int main(void) { >> > #pragma omp parallel for >> > for (auto i = 0; i < 10; i++) { >> > int sum = 0; >> > #pragma omp taskloop shared(sum) >> > for (auto j = 0; j < 1000000; j++) { >> > sum += j; >> > } >> > printf("%d\n", sum); >> > } >> > return 0; >> > } >> >> There are two bugs in your code. First, signed overflow is an undefined >> behaviour and may generate arbitary result. Second, the access to shared >> variable sum is racing, the result may vary with scheduling. > > Fix: > > #pragma omp parallel for > for (auto i = 0; i < 10; i++) { > long long sum = 0; > #pragma omp taskloop shared(sum) > for (auto j = 0; j < 1000000; j++) { > __atomic_add_fetch(&sum, j, __ATOMIC_RELAXED); > } > printf("%lld\n", sum); > } > > This would generate "lock addq" instruction for "sum", instead of loading > it into a register. > -- > Xi Ruoyao <ryxi@xxxxxxxxxxxxxxxxx> > School of Aerospace Science and Technology, Xidian University