Hi there, I am wondering which optimization options bring offer me the best runtime performance (speed of execution) on modern hardware. The traditional thinking of time vs. space optimization is no longer true due to CPU caches. Often, smaller code is more runtime efficient, because the cache hit rates are much higher and that outweighs the negative effects of jumps. I am working on rsyslog[1], a GPLed enhanced syslogd replacement. As part of the system infrastructure, I would like to offer maximum speed. So I have begun to dig a bit into the the doc (I am by far NO gcc expert user). One of my project user's suggested to use -Os because that size reduction would probably result in faster execution (due to the caches). However, I see that -Os does not properly align some objects, so I think this can actually result in far worse cache performance. On the other hand -O3 does things like loop unrolling, which I would consider counter-productive on today's CPU/cache subsystems. I came to the conclusion that -O2 actually offers the best runtime performance, to be improved only by hand-picking the optimization options. I would appreciate feedback on this issue. Is my conclusion correct? Am I overlooking something (maybe something obvious)? My ultimate goal is to modify the rsyslog build process so that we get the best runtime performance and I also intend to write some doc telling expert users how to best tweak it for those high end systems. As such, I would really appreciate any help in gaining an in-depth understanding. Referal's to links are quite welcome. My googeling unfortunately did not bring up something I considered relevant (in the gcc context). Many thanks, Rainer Gerhards [1] http://www.rsyslog.com