I have been going through the bzip2 source code recently and discovered that when I forced a certain function, generateMTFValues found in compress.c, to not be inlined using __attribute__((noinline)), the code sped up 20% when doing compression. That function is only called from inside of one function - BZ2_compressBlock in the same source file. The function is the most CPU intensive of the entire program during compression of an average file. I compiled the code with gcc 4.5.2 using options -Wall -Winline -O2. I found the speed-up to happen when compressing both a 15MB and a 150MB file. I was wondering if anyone could explain to me why inlining is slowing down the code in this case? I asked on Stack Overflow, but no one was fully sure. The guesses on there were that the inlining was making the code larger and thus causing more misses of the CPU's instruction cache, or that the optimizer wasn't able to use the registers as well when the function was inlined. Also, has anyone else seen this? Does it happen very often? Thanks in advance, Justin