On 14/01/13 16:04, Tim Prince wrote: > It's a Frequently Encountered Problem. What did > -ftree-vectorizer-verbose=3 produce? Nothing. At 5 it gave: 27: versioning for alias required: can't determine dependence between *D.1967_20 and *D.1988_49 27: mark for run-time aliasing test between *D.1967_20 and *D.1988_49 [...] 27: disable versioning for alias - max number of generated checks exceeded. which implies that "restrict" is being clobbered. > Part of the problem is that the OpenMP chunks won't have the > alignments you set carefully for the start of the array, unless the > loop count happens to be a multiple of number of threads times > unrolling factor times vector register width, thus unknown at compile > time. It remains to be seen how much OpenMP 4.0 proposals for pragmas > to deal with this may help. Until then, OpenMP tends to work better > with at least 2 levels of loops, where the outer is parallelizable > and the inner vectorizable. Okay. Can anyone suggest a good blocking methodology such that given for (int i = 0; i < n; ++i) // Code which uses parameters ... where we require the parameters ... have an alignment of X 'items' (so for 256-bit AVX registers and float types X = 32/4 = 8) yields: for (outer) for (inner) // Code such that the outer loop can be hit with OpenMP and the inner loop with auto-vectorization. Regards, Freddie.