The following code is a simplified version of what someone doing C++ metaprogramming would hand the optimizer for example via boost::mpl::for_each(). #include <iostream> void foo(int i){ std::cout<<i<<std::endl; } template<int N> void bar(){ bar<N-1>(); foo(N); } template<> void bar<-1>(){} int main(){ bar<1024>(); return 0; } gcc used to generate a bunch of bars that could be inlined, but weren't. g++ -O3 -ftemplate-depth=1034 template_recurision_ctest.cpp objdump -f -d -C a.out | grep bar |grep ">:" 08048700 <void bar<-1>()>: 08048720 <void bar<19>()>: 08048fb0 <void bar<39>()>: 08049840 <void bar<59>()>: 0804a0d0 <void bar<79>()>: 0804a960 <void bar<99>()>: 0804b1f0 <void bar<159>()>: 0804cb60 <void bar<179>()>: 0804d3f0 <void bar<319>()>: 0804fc00 <void bar<339>()>: 08050490 <void bar<959>()>: 080535a0 <void bar<979>()>: 08053e30 <void bar<1019>()>: The only way to prevent this was turning on LTO, which then realized that all these functions were called only from one place and could be inlined. g++ 4.7.1 does it right an inlines all calls to foo() into main(). Awesome!!! What changed? Who do I thank? Even more cool is the debug information lets gdb attribute the code to a stack of 1000 bar()s. Chris