is there a way to optimize the following c++ method? // dot product template <class T> T Vector<T>::operator*(const Vector& v) const { T ret_val=0; for(std::size_t i=0; i < m_Dim; i++) ret_val += m_data[i] * v.m_data[i]; return ret_val; } (m_Dim is around 500-1000) especially for 32-float, i.e. using sse? or do you think g++ will make the best out of it? thank you...