On Thu, May 5, 2011 at 5:18 PM, Justin Peel <peelpy@xxxxxxxxx> wrote: > I have been going through the bzip2 source code recently and > discovered that when I forced a certain function, generateMTFValues > found in compress.c, to not be inlined using > __attribute__((noinline)), the code sped up 20% when doing > compression. That function is only called from inside of one function > - BZ2_compressBlock in the same source file. The function is the most > CPU intensive of the entire program during compression of an average > file. I compiled the code with gcc 4.5.2 using options -Wall -Winline > -O2. I found the speed-up to happen when compressing both a 15MB and a > 150MB file. > > I was wondering if anyone could explain to me why inlining is slowing > down the code in this case? I asked on Stack Overflow, but no one was > fully sure. The guesses on there were that the inlining was making the > code larger and thus causing more misses of the CPU's instruction > cache, or that the optimizer wasn't able to use the registers as well > when the function was inlined. > > Also, has anyone else seen this? Does it happen very often? > > Thanks in advance, > Justin > I appreciate the replies. I'm quite new at this as is painfully obvious. I did see when using Oprofile that there are more cache misses when the function has been inlined. I'm sure that I'll be looking into this more in a week when I'm back from my trip. Justin