Tim Prince wrote:
burlen wrote:
Can loops with a non-unit stride be automagically optimized by compiler
with SSE?
template <int nComp>
void norm(double *result, double *data, size_t n)
{
double *pDat=data;
double *pRes=result;
for (size_t i=0; i<n; ++i)
{
*pRes=*pDat**pDat;
for (int j=1; j<nComp; ++j)
{
*pRes+=pDat[j]*pDat[j];
}
*pRes=sqrt(*pRes);
pRes+=1;
pDat+=nComp;
}
}
Your inner loop appears to have unit stride, and might be optimized easily
if you didn't write it with potential aliases. If you meant
inner_product(), why not use that?
Inner loop does have unit stride but its usually small between 1 and 12
and the outer loop is usually large in the 10-100s of thousands. That
example is simply one simple situation that I encounter. I want to
understand how the compiler applies SSE optimization. What can be
automatically SSE optimized by g++? Is this documented somewhere?
I want to write in such a way to take advantage of g++ capability. It's
important for me to let g++ do optimization because the code needs to be
cross platform.
I know gmail is fashionable, but there's plenty of reason for it going in
the spam box, and no effort at google to improve the situation.
Sorry but that's all I've got at the moment.
Thanks
Burlen