On Mon, 27 Jun 2022, Adonis Ling via Gcc-help wrote: > Hi all, > > Recently, I met an issue with auto vectorization. > > As following code shows, why uint32_t prevents the compiler (GCC 12.1 + O3) > from optimizing by auto vectorization. See https://godbolt.org/z/a3GfaKEq6. > > #include <cstdint> > > // no auto vectorization > void test32(uint32_t *array, uint32_t &nread, uint32_t from, uint32_t to) { > for (uint32_t i = from; i < to; i++) { > array[nread++] = i; > } > } Here the main problem is '*array' and 'nread' have the same type, so they might overlap. Ideally the compiler would recognize that that cannot happen because it would make 'array[nread++] = i' undefined due to unsequenced modifications, but GCC is not sufficiently smart (yet). The secondary issue is the same as below: > // no auto vectorization > void test_another_32(uint32_t *array, uint32_t &nread, uint32_t from, > uint32_t to) { > uint32_t index = nread; > for (uint32_t i = from; i < to; i++) { > array[index++] = i; > } > nread = index; > } ... here: the issue is that index is unsigned and shorter than pointer type, it can wrap around from 0xffffffff to 0, making the access non-consecutive. When you compile for 32-bit x86, this loop is vectorized. Alexander