On Monday 12 October 2009 19:32:18 Ian Lance Taylor wrote: > Thomas Heller <thomas.heller1@xxxxxx> writes: > > I ran into a little issue when trying to force inlining with > > __attribute__(( always_inline )). The reason why i am trying to force > > the compiler to inline my code is simple: I want to implement handwritten > > optimizations using SSE intrinsics. However it seems that gcc is not > > willing to inline that code anymore. That is why i came up with the idea > > of trying to force gcc. > > There is a problem now. I get tons of error messages that gcc is not able > > to inline that function. > > Example error message: > > sorry, unimplemented: inlining failed in call to ‘const > > pe::Vector3<typename pe::MathTrait<T1, T2, true>::AddType> > > pe::operator+(const pe::Vector3<Type>&, const pe::Vector3<T2>&) [with T1 > > = double, T2 = double]’: function not inlinable > > > > I was trying to put that in a little testcase. However it seems that i > > can't reproduce that error with a small code base. > > Any ideas? Need more information? > > This question is not appropriate for the gcc@xxxxxxxxxxx mailing > list. It would be appropriate for gcc-help@xxxxxxxxxxxx Please take > any followups to gcc-help. Thanks. Sorry for the inconvenience. > Unfortunate, it's basically impossible for us to say anything useful > without some sort of test case. I attached some of my vector classes code. Now, for example operator+ fails to get inline if compiled with -DUSE_SSE. Unfortunately, This won't happen in small examples. But since i don't know what exaclty causes the problem, i can not isolate the problem. My code makes excessive use of the inline keyword itself. > I assume you are using the SSE intrinsics from mmintrin.h and > friends. Those intrinsics should be reliably inlined. Why is it > necessary for you to inline them further? Yes I am using mmintrin.h and friends. These functions i want to inline are called in an inner loop very often. The reason why i want to inline them is simple: On simple testcases (where inlining actually works), my handoptimized version is around 20% faster than without. However, when running my real application I get a performance drop of 20% with my optimized version. Upon investigation i figured out, that gcc inlined the unoptimzied function calls but not my optimized ones. For whatever reason. That is why i think it is necessary to inline further. > For whatever it's worth, the development version of gcc gives better > messages about why a function can not be inlined. Thanks i will try that. Thomas
template< typename T > class Vector3 { public: Vector3( T x, T y, T z ) { v_[0] = x; v_[1] = y; v_[2] = z; } #if USE_SSE Vector3( __m128d xy, __m128d zw ) { xy_() = xy; zw_() = zw; } #endif T& operator[] ( int i ) { return v_[i]; } const T& operator[] ( int i ) const { return v_[i]; } private: #ifndef USE_SSE T v_[3]; #else Type v_[4] __attribute__((aligned (16))); inline __m128d& xy_() { return *reinterpret_cast<__m128d *>( &v_[0] ); } inline const __m128d& xy_() const { return *reinterpret_cast<__m128d const *>( &v_[0] ); } inline __m128d& zw_() { return *reinterpret_cast<__m128d *>( &v_[2] ); } inline const __m128d& zw_() const { return *reinterpret_cast<__m128d const *>( &v_[2] ); } #endif template< typename T > friend const Vector3<T> operator+( const Vector3<T>&, const Vector3<T>& ); } // this works template< typename T > __attribute__(( always_inline )) const Vector3<T> operator+( const Vector3<T>& v1, const Vector3<T>& v2 ) { return Vector3<T>( v1[0] + v2[0], v1[1] + v2[1], v1[2] + v2[2] ); } // functions like these don't work #if USE_SSE template<> __attribute__(( always_inline )) const Vector3<double> operator+( const Vector3<double>& v1, const Vector3<double>& v2 ) { return Vector3<double>( _mm_add_pd( lhs.xy_(), rhs.xy_() ), _mm_add_sd( lhs.zw_(), rhs.zw_() ) ); } #endif