On 06/05/11 00:18, Justin Peel wrote: > I have been going through the bzip2 source code recently and > discovered that when I forced a certain function, generateMTFValues > found in compress.c, to not be inlined using > __attribute__((noinline)), the code sped up 20% when doing > compression. That function is only called from inside of one function > - BZ2_compressBlock in the same source file. The function is the most > CPU intensive of the entire program during compression of an average > file. I compiled the code with gcc 4.5.2 using options -Wall -Winline > -O2. I found the speed-up to happen when compressing both a 15MB and a > 150MB file. > > I was wondering if anyone could explain to me why inlining is slowing > down the code in this case? You have to read the code. > I asked on Stack Overflow, but no one was fully sure. The guesses on > there were that the inlining was making the code larger and thus > causing more misses of the CPU's instruction cache, or that the > optimizer wasn't able to use the registers as well when the function > was inlined. > > Also, has anyone else seen this? Does it happen very often? It's not that unusual. My first guess is that you get a better register allocation when it's not inlined, but I'd have to see. Are there many local variables? Andrew.