Re: How can inlining a function be slowing down this program?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/5/2011 4:18 PM, Justin Peel wrote:
I have been going through the bzip2 source code recently and
discovered that when I forced a certain function, generateMTFValues
found in compress.c, to not be inlined using
__attribute__((noinline)), the code sped up 20% when doing
compression. That function is only called from inside of one function
- BZ2_compressBlock in the same source file. The function is the most
CPU intensive of the entire program during compression of an average
file. I compiled the code with gcc 4.5.2 using options -Wall -Winline
-O2. I found the speed-up to happen when compressing both a 15MB and a
150MB file.

I was wondering if anyone could explain to me why inlining is slowing
down the code in this case? I asked on Stack Overflow, but no one was
fully sure. The guesses on there were that the inlining was making the
code larger and thus causing more misses of the CPU's instruction
cache, or that the optimizer wasn't able to use the registers as well
when the function was inlined.

Also, has anyone else seen this? Does it happen very often?

Thanks in advance,
Justin
Depending greatly on the CPU model, instruction cache or loop stream detector can be sensitive to loop body alignment and degree of unrolling. This is probably a frequent issue on some CPU models, particularly when you have conditional branches in the inner loop. More detailed explanation is up to you to find, perhaps with the aid of event collecting profiles such as oprofile or VTune.


[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux