Hi, On Tue, Nov 24, 2009 at 07:44:16PM +0000, Nava Whiteford wrote: > Hi, > > Are template classes faster than using a virtual derived class via a > pointer to its base class? Yes - the calling is faster, and they can be inlined. > > Intuition tells me it should be faster, and reasonably significantly so. > However I put together the following test: > > http://linuxjunk.blogspot.com/2009/11/are-templates-faster-than-subclasses.html > > Which suggests to me that there isn't really a lot in it. In your example, most of the time is spent INSIDE the function (during cout), not while calling the function. Try the most extreme case I can imaging - doing nothing in the functions: struct base{ virtual void Do(int i) = 0;}; struct derived:base{ virtual void Do(int i){} }; template <typename T> struct Template{ void Do(int i){} }; int main(){ Template<int> *t = new Template<int>(); base *b = new derived(); for( int j = 0; j < 100000; ++j) for( int i = 0; i < 100000; ++i){ // uncomment on of the two following lines // t->Do(i); b->Do(i); } delete t; delete b; } with g++ 4.3.2, I obtain on my machine when compiling with "-O3": - when using "t->Do(i): runtime 0:00s (probably the compiler inlines the function code and realizes, that nothing is happening, so both for-loops can be optimized away). - when using "b->Do(i): 50.54s (my explanation: as the compiler can't inline the virtual functions, he creates a code which calls Do(i) 10^10 time). Of course, that is a real extreme case, which probably will never appear in a real code. So in reality, I expect template-code always to be faster than virtual functions - but the speed-difference is only important if the time spent inside the function is negligible to the time needed to call the function. In addition: When you use extern template instantiations, inlining the functions becomes more difficult (I don't know whether g++ can inline then?) On the other hand: in this trivial case, the base-class is known at compile-time. So in theory, the compiler could be able to inline the functions - and thus realize that my code is doing nothing, and optimize the derived case also to a runtime 0:00s. Maybe some newer version of gcc is already capable to do this optimization? HTH, Axel