On Thu, 28 May 2015, Ralf Baechle wrote: > > The jump to the delay slot combined with the unusual register usage > > convention taken here made it trickier than it would normally be to make a > > fix that does not regress -- in terms of code size -- unaffected microMIPS > > systems. I tried several versions and eventually I came up with this one > > that I believe produces the best code in all cases, at the cost of these > > #ifdefs. I hope they are acceptable. > > I think it's all a hint to rewrite the thing in a language that > transparently handles the DADDIU issue. Such as C. Which would also > make using a better algorithm easier. Probably. One concern that bothers me is the ability of GCC to make alternative entry points into frameless leaf functions. Here we have `__strnlen_kernel_asm' that falls through to `__strnlen_kernel_nocheck_asm'. That's a nice optimisation (we could probably schedule that `move $v0, $a0' into its preceding delay slot too, even though one might consider it hilarious to have a function's entry point in a delay slot). It would likely be lost in a conversion to C. But perhaps GCC can get better, or maybe it already has? I haven't been tracking what's been happening recently on that front. What I have in mind is that given: bar() { blah; } foo() { blah_blah; bar(); } in a single compilation unit, rather than making `foo' tail-jump to `bar' GCC would inline `bar' into `foo' entirely and merely export an additional `bar' entry point in the middle of `foo', where the original body of `bar' begins. Maciej