On Fri, 23 Jul 2004, Ralf Baechle wrote: > With a bit of hand waiving because haven't done benchmarks I guess Richard > might be right. The subroutine calling overhead on modern processors is > rather low and smaller code means better cache hit rates ... Well, I just worry the call may itself include at least the same number of instructions as the callee if inlined. There would be no way for it to be faster. That may happen for a leaf function -- the call itself, plus $ra saving/restoration is already four instructions. Now it's sufficient for two statics to be needed to preserve temporaries across such a call and the size of the caller is already the same. With three statics, you lose even for a non-leaf function. That's for a function containing a single call to such a shift -- if there are more, then you may win (but is it common?). So not only it may not be faster, but the resulting code may be bigger as well. That said, the current GCC's implementation of these operations is not exactly optimal for current MIPS processors. That's trivial to deal with in Linux, but would it be possible to pick a different implementation from libgcc based on the "-march=" setting, too? Maciej