Is there any way to collect CPU cache misses statistics?
Does the Intel CPU at all collect this info?
I have just been to a client where a 1.8GHz P-4 machine with 512KB cache took ~11 minutes to compile a certain file, while a 550MHz P-III with 256KB cache took ~10 *hours*. Virtual memory was never used in either of the cases. Top showed g++ taking (almost) all the CPU in user mode. Compiling without optimizations was quick on both machines.
I eventually solved this particular problem by changing the sources in a way that I deemed would make the g++ optimizer require less memory. The change indeed worked (2.5min on the fast machine, 5min on the slow machine). I am still wondering, though, whether a CPU cache miss was indeed the problem, and I was wondering whether there is any way of getting an authorative answer (as well as debugging this without using hunches and guesswork).
Many thanks,
Shachar
-- Shachar Shemesh Open Source integration consultant Home page & resume - http://www.shemesh.biz/
-- Kernelnewbies: Help each other learn about the Linux kernel. Archive: http://mail.nl.linux.org/kernelnewbies/ FAQ: http://kernelnewbies.org/faq/