Hi David, See here: https://godbolt.org/g/Qi1Szx The problem is that _mm_store_pd is implemented in a way that GCC and clang assume that it can alias any memory. In f0, the compiler has no idea whether the store modifies the index value. In f1 it knows, because it's a local variable. It seems ICC does not assume this kind of aliasing on _mm_store_pd. Cheers, Matthias On Dienstag, 11. Juli 2017 18:16:43 CEST David Pfander wrote: > Hello everyone, > > I have some trouble understanding GCC's code generation. I'm accessing > different components of a big array. (Or in more detail: I'm writing a > struct-of-array abstraction in C++.) Now, I want to read different > components with minimal overhead, because this access takes place in an > extremely hot loop: > > [...] > m[0] = expansions_SoA.value<0>(flat_index); > m[1] = expansions_SoA.value<1>(flat_index); > [...] > > The expansions_SoA is an instance of a class that has the accessed array > as a member: > > template <typename component_type, size_t num_components, size_t entries> > class struct_of_array_data { > private: > component_type* const data; > [...] > } > > The array pointer is initialized, used and absolutely never changed (and > finally deleted[]). > > If you now look at a screenshot of vtune (or gdb) ouput here: > http://imgur.com/a/7YJBC > You'll see that there is a suspicious mov instruction > > movq (%rax), %rdx > > over and over again. This reloads the "data" member, so basically it > reloads the same value. > > My question is: Why isn't this value just kept in registers (it is > reused immediately)? (How) can I get rid of the duplicate load? > > Best regards, > David Pfander -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://kretzfamily.de GSI Helmholtzzentrum für Schwerionenforschung https://gsi.de SIMD easy and portable https://github.com/VcDevel/Vc ──────────────────────────────────────────────────────────────────────────