Re: prefetch question

Tim Prince <n8tm@xxxxxxx> · Mon, 18 Jan 2010 13:18:12 -0800

Thomas Witzel wrote:

The option -fprefetch-loop-arrays generates prefetch commands for
non-vectorized code,
but not for vectorized one. Is that the intended functionality ? Is
there a way to get prefetching also
for the vectorized routines ?
Thanks, Thomas

Example: (compiled with g++-4.5.0)
g++ -O3 -fcx-fortran-rules -fprefetch-loop-arrays -mtune=core2
-march=core2 -mssse3 -S -c ../test_loop.cpp

The code for a complex multiplication loop done this way:

void f(std::complex<float> *a, std::complex<float> *b, std::complex<float> *r)
{
        for(std::size_t s=0; s<N; s++)
                r[s] = a[s]*b[s];
}

Is generated two-fold, one vectorized (.L3) and one not (L5):

It's certainly hard to guess the effect of pre-fetching only in the 
remainder loop (early 32-bit pentium4 style?).  As you have set 
-mtune=core2, it seems reasonable the compiler would not optimize for 
Athlon-32, which may have been the most recent common CPU without 
effective hardware prefetch for vectorized loops.
I don't really expect gcc to attempt further optimization specific to 
-mssse3, now that it's about 2 years out of production.