On Tue, Jan 23, 2007 at 05:26:21PM +0100, Franck Bui-Huu wrote: > No, I haven't. Since the size code has been reduced by a factor 2, I > would think that signal code can better fit in instruction cache > lines. For example, the loop is made up by 11 instructions (I don't > know why gcc makes it so big though) which fits into 3 cache lines in > my cases. Where as the old code generated 246 instructions for the > same job, which should cause many more cache misses. > > Do you have any pointers on benchmarks I could run ? For stuff like this microbenchmarks like lmbench are best suited. Lmbench recently moved to http://sourceforge.net/projects/lmbench. Ralf