On 1/23/07, Ralf Baechle <ralf@xxxxxxxxxxxxxx> wrote:
On Tue, Jan 23, 2007 at 03:18:17PM +0100, Franck Bui-Huu wrote: > text data bss dec hex filename > 11972 0 0 11972 2ec4 arch/mips/kernel/signal.o~old > 5380 0 0 5380 1504 arch/mips/kernel/signal.o~new Have you ran any benchmarks on this? Unrolling the loops used to make a noticable difference.
No, I haven't. Since the size code has been reduced by a factor 2, I would think that signal code can better fit in instruction cache lines. For example, the loop is made up by 11 instructions (I don't know why gcc makes it so big though) which fits into 3 cache lines in my cases. Where as the old code generated 246 instructions for the same job, which should cause many more cache misses. Do you have any pointers on benchmarks I could run ? thanks -- Franck