On gcc trunk, the performance culprit is at gcc/cp/call.c function build_over_call at: if (undeduced_auto_decl (fn)) mark_used (fn, complain); // <= this guy from gcc-7-branch r249333 else /* Otherwise set TREE_USED for the benefit of -Wunused-function. See PR80598. */ TREE_USED (fn) = 1; I'm still working on a code sample. The code sample has to be large to tickle the issue so far.