Details on -fprofile-use and associated optimization flags

Paul Dovydaitis <paul.dovy@xxxxxxxxx> · Fri, 3 Jun 2011 13:04:00 -0500

Hello,

I have been playing around with the profile guided optimization flags
in GCC and had a few questions.  I actually noticed that with the
particular application I was profiling, I in some cases got worse
results after recompiling with -fprofile-use and re-running the same
test I used to generate the profile.

I could not find any detailed documentation about how the profile is
applied during the optimization steps, but my working hypothesis is as
follows.  The application in question is very latency sensitive, so it
spins in a tight loop while polling for data from various sources.
When data is received, it then processes it – and this processing is
the portion in which speed is most important and which I am timing.
If one were to look at hit counts for individual functions or source
lines, though, it would appear that the polling loop was “hot” while
everything else by comparison was extremely “cold”.  Global
optimizations that use this sort of data would then likely hurt,
rather than help those processing times.

What optimizations does -fprofile-use turn on that would be sensitive
to this sort of behavior?  Does this seem like a reasonable
explanation, and if so is there anything I can tweak to help?  The
specific flags that -fprofile-use turns on (-fbranch-probabilities,
-fvpt, -funroll-loops, -fpeel-loops and –ftracer) don’t seem like they
would be affected since these optimizations appear local in scope, but
I assume there are other things happening under the hood.

Regards,
Paul