Re: inlining problems

Thomas Heller <thomas.heller1@xxxxxx> · Mon, 12 Oct 2009 19:55:28 +0200



On Monday 12 October 2009 19:32:18 Ian Lance Taylor wrote:
> Thomas Heller <thomas.heller1@xxxxxx> writes:
> > I ran into a little issue when trying to force inlining with
> > __attribute__(( always_inline )). The reason why i am trying to force
> > the compiler to inline my code is simple: I want to implement handwritten
> > optimizations using SSE intrinsics. However it seems that gcc is not
> > willing to inline that code anymore. That is why i came up with the idea
> > of trying to force gcc.
> > There is a problem now. I get tons of error messages that gcc is not able
> > to inline that function.
> > Example error message:
> > sorry, unimplemented: inlining failed in call to ‘const
> > pe::Vector3<typename pe::MathTrait<T1, T2, true>::AddType>
> > pe::operator+(const pe::Vector3<Type>&, const pe::Vector3<T2>&) [with T1
> > = double, T2 = double]’: function not inlinable
> >
> > I was trying to put that in a little testcase. However it seems that i
> > can't reproduce that error with a small code base.
> > Any ideas? Need more information?
> 
> This question is not appropriate for the gcc@xxxxxxxxxxx mailing
> list.  It would be appropriate for gcc-help@xxxxxxxxxxxx  Please take
> any followups to gcc-help.  Thanks.
Sorry for the inconvenience.

> Unfortunate, it's basically impossible for us to say anything useful
> without some sort of test case.
I attached some of my vector classes code. Now, for example operator+ fails to 
get inline if compiled with -DUSE_SSE. Unfortunately, This won't happen in 
small examples.
But since i don't know what exaclty causes the problem, i can not isolate the 
problem.
My code makes excessive use of the inline keyword itself.

> I assume you are using the SSE intrinsics from mmintrin.h and
> friends.  Those intrinsics should be reliably inlined.  Why is it
> necessary for you to inline them further?
Yes I am using mmintrin.h and friends.
These functions i want to inline are called in an inner loop very often.
The reason why i want to inline them is simple:
On simple testcases (where inlining actually works), my handoptimized
version is around 20% faster than without.
However, when running my real application I get a performance drop
of 20% with my optimized version. Upon investigation i figured out, that
gcc inlined the unoptimzied function calls but not my optimized ones. For 
whatever reason.
That is why i think it is necessary to inline further.

> For whatever it's worth, the development version of gcc gives better
> messages about why a function can not be inlined.
Thanks i will try that.

Thomas
template< typename T >
class Vector3
{
	public:
		Vector3( T x, T y, T z )
		{
			v_[0] = x;
			v_[1] = y;
			v_[2] = z;
		}

#if USE_SSE
		Vector3( __m128d xy, __m128d zw )
		{
			xy_() = xy;
			zw_() = zw;
		}
#endif

		T& operator[] ( int i )
		{
			return v_[i];
		}
		const T& operator[] ( int i ) const
		{
			return v_[i];
		}

	private:
#ifndef USE_SSE
		T v_[3];
#else
		Type v_[4] __attribute__((aligned (16)));
		
		inline __m128d& xy_() { return *reinterpret_cast<__m128d *>( &v_[0] ); }
		inline const __m128d& xy_() const { return *reinterpret_cast<__m128d const *>( &v_[0] ); }
		
		inline __m128d& zw_() { return *reinterpret_cast<__m128d *>( &v_[2] ); }
		inline const __m128d& zw_() const { return *reinterpret_cast<__m128d const *>( &v_[2] ); }
#endif

		template< typename T >
		friend const Vector3<T> operator+( const Vector3<T>&, const Vector3<T>& );
}


// this works
template< typename T >
__attribute__(( always_inline ))
const Vector3<T> operator+( const Vector3<T>& v1, const Vector3<T>& v2 )
{
	return Vector3<T>( v1[0] + v2[0], v1[1] + v2[1], v1[2] + v2[2] );
}


// functions like these don't work
#if USE_SSE
template<>
__attribute__(( always_inline ))
const Vector3<double> operator+( const Vector3<double>& v1, const Vector3<double>& v2 )
{
	return Vector3<double>( _mm_add_pd( lhs.xy_(), rhs.xy_() ),
                           _mm_add_sd( lhs.zw_(), rhs.zw_() ) );
}
#endif