Hi all, I have a function which I wish to accelerate with auto-vectorization and OpenMP: void fn(float *restrict rho_in, float *restrict E_in, float *restrict rhou_in, float *restrict rhov_in, float *restrict f0rho_out, float *restrict f0E_out, float *restrict f0rhou_out, float *restrict f0rhov_out, float *restrict f1rho_out, float *restrict f1E_out, float *restrict f1rhou_out, float *restrict f1rhov_out, int n) { rho_in = (float *) __builtin_assume_aligned(rho_in, 32); E_in = (float *) __builtin_assume_aligned(E_in, 32); rhou_in = (float *) __builtin_assume_aligned(rhou_in, 32); rhov_in = (float *) __builtin_assume_aligned(rhov_in, 32); f0rho_out = (float *) __builtin_assume_aligned(f0rho_out, 32); f0E_out = (float *) __builtin_assume_aligned(f0E_out, 32); f0rhou_out = (float *) __builtin_assume_aligned(f0rhou_out, 32); f0rhov_out = (float *) __builtin_assume_aligned(f0rhov_out, 32); f1rho_out = (float *) __builtin_assume_aligned(f1rho_out, 32); f1E_out = (float *) __builtin_assume_aligned(f1E_out, 32); f1rhou_out = (float *) __builtin_assume_aligned(f1rhou_out, 32); f1rhov_out = (float *) __builtin_assume_aligned(f1rhov_out, 32); #pragma omp parallel for for (int i = 0; i < n; ++i) { float rho = rho_in[i], E = E_in[i]; float rhou = rhou_in[i], rhov = rhov_in[i]; float invrho = 1.0f/rho; float u = invrho*rhou, v = invrho*rhov; float p = 0.4f*(E - 0.5f*(rhou*u + rhov*v)); f0rho_out[i] = rhou; f1rho_out[i] = rhov; f0rhou_out[i] = rhou*u + p; f1rhou_out[i] = rhov*u; f0rhov_out[i] = rhou*v; f1rhov_out[i] = rhov*v + p; f0E_out[i] = (E + p)*u; f1E_out[i] = (E + p)*v; } } the combination of "restrict" along with the alignment fluff yields some extremely tight ASM on my AVX-capable system. However, when OpenMP enters the mix the resulting code is not vectorized: gcc-4.7.2 -std=c99 -Ofast -fopenmp -march=native -S fn.c as can be seen by a simple inspection of the resulting assembly. I believe this is due to Bug 46032 (although some of the comments imply that it should be fixed). It appears as if either the "restrict" properly or the alignment is getting clobbered when the OpenMP 'inner' function is generated. Can anyone suggest any workarounds? It seems like a common problem and really do not want to reinvent the wheel if a simple refactoring of my code can iron everything out. Regards, Freddie.