Hi Alexander, thanks for your reply. On Tue, Jun 28, 2022 at 9:06 PM Alexander Monakov <amonakov@xxxxxxxxx> wrote: > On Mon, 27 Jun 2022, Adonis Ling via Gcc-help wrote: > > > Hi all, > > > > Recently, I met an issue with auto vectorization. > > > > As following code shows, why uint32_t prevents the compiler (GCC 12.1 + > O3) > > from optimizing by auto vectorization. See > https://godbolt.org/z/a3GfaKEq6. > > > > #include <cstdint> > > > > // no auto vectorization > > void test32(uint32_t *array, uint32_t &nread, uint32_t from, uint32_t > to) { > > for (uint32_t i = from; i < to; i++) { > > array[nread++] = i; > > } > > } > > Here the main problem is '*array' and 'nread' have the same type, so they > might > overlap. Ideally the compiler would recognize that that cannot happen > because it > would make 'array[nread++] = i' undefined due to unsequenced > modifications, but > GCC is not sufficiently smart (yet). The secondary issue is the same as > below: > I got your point. After that, I tried to add __restrict__ to nread as the following shows and GCC still doesn't optimize it. #include <cstdint> // no auto vectorization void test32(uint32_t *array, uint32_t & __restrict__ nread, uint32_t from, uint32_t to) { for (uint32_t i = from; i < to; i++) { array[nread++] = i; } } However, when I used Clang to compile, I noticed the code was optimized by Clang. See https://godbolt.org/z/eEz9W7o9z . > > // no auto vectorization > > void test_another_32(uint32_t *array, uint32_t &nread, uint32_t from, > > uint32_t to) { > > uint32_t index = nread; > > for (uint32_t i = from; i < to; i++) { > > array[index++] = i; > > } > > nread = index; > > } > > ... here: the issue is that index is unsigned and shorter than pointer > type, it > can wrap around from 0xffffffff to 0, making the access non-consecutive. > When > you compile for 32-bit x86, this loop is vectorized. > > Alexander > Clang also optimizes this function. See https://godbolt.org/z/eEz9W7o9z . -- Best regards, Adonis