burlen wrote: > Can loops with a non-unit stride be automagically optimized by compiler > with SSE? > > template <int nComp> > void norm(double *result, double *data, size_t n) > { > double *pDat=data; > double *pRes=result; > > for (size_t i=0; i<n; ++i) > { > *pRes=*pDat**pDat; > for (int j=1; j<nComp; ++j) > { > *pRes+=pDat[j]*pDat[j]; > } > *pRes=sqrt(*pRes); > > pRes+=1; > pDat+=nComp; > } > } Your inner loop appears to have unit stride, and might be optimized easily if you didn't write it with potential aliases. If you meant inner_product(), why not use that? I know gmail is fashionable, but there's plenty of reason for it going in the spam box, and no effort at google to improve the situation.