On Thu, May 28, 2015 at 06:51:27PM +0100, Maciej W. Rozycki wrote: > On Thu, 28 May 2015, Ralf Baechle wrote: > > > > The jump to the delay slot combined with the unusual register usage > > > convention taken here made it trickier than it would normally be to make a > > > fix that does not regress -- in terms of code size -- unaffected microMIPS > > > systems. I tried several versions and eventually I came up with this one > > > that I believe produces the best code in all cases, at the cost of these > > > #ifdefs. I hope they are acceptable. > > > > I think it's all a hint to rewrite the thing in a language that > > transparently handles the DADDIU issue. Such as C. Which would also > > make using a better algorithm easier. > > Probably. One concern that bothers me is the ability of GCC to make > alternative entry points into frameless leaf functions. > > Here we have `__strnlen_kernel_asm' that falls through to > `__strnlen_kernel_nocheck_asm'. That's a nice optimisation (we could > probably schedule that `move $v0, $a0' into its preceding delay slot too, > even though one might consider it hilarious to have a function's entry > point in a delay slot). > > It would likely be lost in a conversion to C. But perhaps GCC can get > better, or maybe it already has? I haven't been tracking what's been > happening recently on that front. > > What I have in mind is that given: > > bar() { blah; } > > foo() { blah_blah; bar(); } > > in a single compilation unit, rather than making `foo' tail-jump to `bar' > GCC would inline `bar' into `foo' entirely and merely export an additional > `bar' entry point in the middle of `foo', where the original body of `bar' > begins. In this particular case we might move the access_ok() in to the strnlen_user function, something like: static inline long strnlen_user(const char __user *s, long n) { long res; might_fault(); if (!access_ok(VERIFY_READ, s, 0)) return 0; if (segment_eq(get_fs(), get_ds())) { __asm__ __volatile__( "move\t$4, %1\n\t" "move\t$5, %2\n\t" __MODULE_JAL(__strnlen_kernel_nocheck_asm) "move\t%0, $2" : "=r" (res) : "r" (s), "r" (n) : "$2", "$4", "$5", __UA_t0, "$31"); } else { __asm__ __volatile__( "move\t$4, %1\n\t" "move\t$5, %2\n\t" __MODULE_JAL(__strnlen_kernel_nocheck_asm) "move\t%0, $2" : "=r" (res) : "r" (s), "r" (n) : "$2", "$4", "$5", __UA_t0, "$31"); } return res; } I'd not be surprised if GCC can optimize that better than the existing code. Ralf