When compiled on GCC 5.1 using -std=c99, -O3, and -mavx2, the following code snippet auto-vectorizes: #include <stdint.h> void test(uint32_t *restrict a, uint32_t *restrict b) { uint32_t *a_aligned = __builtin_assume_aligned(a, 32); uint32_t *b_aligned = __builtin_assume_aligned(b, 32); for (int i = 0; i < (1L << 10); i += 2) { a_aligned[i] = 2 * b_aligned[i]; a_aligned[i+1] = 3 * a_aligned[i+1]; } } But the following code snippet does not auto-vectorize: #include <stdint.h> void test(uint32_t *restrict a, uint32_t *restrict b) { uint32_t *a_aligned = __builtin_assume_aligned(a, 32); uint32_t *b_aligned = __builtin_assume_aligned(b, 32); for (int i = 0; i < (1L << 10); i += 2) { a_aligned[i] = 2 * b_aligned[i]; a_aligned[i+1] = a_aligned[i+1]; } } This was also the case for GCC 4.8 and 4.9. Adding volatile to a_aligned's declaration inhibits auto-vectorization completely. Is there a way to make the latter code snippet auto-vectorize? Cheers, S.D.S.