On 04/11/2016 05:17 AM, NightStrike wrote: > I have a routine that normally completes in just under 3 us. The > first time through, however, it takes over 18 us. I have found that > this is due to calling a few math library functions: tanhf, atan2f, > hypotf, and fmod. Subsequent calls are virtually instant. > > I've tried putting __attribute__((optimize("prefetch-loop-arrays"))) > on the outer function, but this isn't much help (which would stand to > reason, since it's not an issue of caching the data, but caching the > function.) Is it at all possible to use a magic option or builtin > that pre-caches the few library functions that I use? It's important > for my application to reduce the gap of the first cycle time. I'm not quite sure if I correctly understand your problem. But if you are talking about the time it takes to resolve the math functions from libm you might try to set the environment variable "LD_BIND_NOW" to 1 (see man ld.so). That way all external symobls get resolved at startup (which will be slower) instead of on demand when the unresolved function gets called. Matthias