Re: [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 28, 2015 at 06:51:27PM +0100, Maciej W. Rozycki wrote:

> On Thu, 28 May 2015, Ralf Baechle wrote:
> 
> > >  The jump to the delay slot combined with the unusual register usage 
> > > convention taken here made it trickier than it would normally be to make a 
> > > fix that does not regress -- in terms of code size -- unaffected microMIPS 
> > > systems.  I tried several versions and eventually I came up with this one 
> > > that I believe produces the best code in all cases, at the cost of these 
> > > #ifdefs.  I hope they are acceptable.
> > 
> > I think it's all a hint to rewrite the thing in a language that
> > transparently handles the DADDIU issue.  Such as C.  Which would also
> > make using a better algorithm easier.
> 
>  Probably.  One concern that bothers me is the ability of GCC to make 
> alternative entry points into frameless leaf functions.
> 
>  Here we have `__strnlen_kernel_asm' that falls through to 
> `__strnlen_kernel_nocheck_asm'.  That's a nice optimisation (we could 
> probably schedule that `move $v0, $a0' into its preceding delay slot too, 
> even though one might consider it hilarious to have a function's entry 
> point in a delay slot).
> 
>  It would likely be lost in a conversion to C.  But perhaps GCC can get 
> better, or maybe it already has?  I haven't been tracking what's been 
> happening recently on that front.
> 
>  What I have in mind is that given:
> 
> bar() { blah; }
> 
> foo() { blah_blah; bar(); }
> 
> in a single compilation unit, rather than making `foo' tail-jump to `bar' 
> GCC would inline `bar' into `foo' entirely and merely export an additional 
> `bar' entry point in the middle of `foo', where the original body of `bar' 
> begins.

In this particular case we might move the access_ok() in to the
strnlen_user function, something like:

static inline long strnlen_user(const char __user *s, long n)
{
        long res;

        might_fault();

	if (!access_ok(VERIFY_READ, s, 0))
		return 0;

        if (segment_eq(get_fs(), get_ds())) {
                __asm__ __volatile__(
                        "move\t$4, %1\n\t"
                        "move\t$5, %2\n\t"
                        __MODULE_JAL(__strnlen_kernel_nocheck_asm)
                        "move\t%0, $2"
                        : "=r" (res)
                        : "r" (s), "r" (n)
                        : "$2", "$4", "$5", __UA_t0, "$31");
        } else {
                __asm__ __volatile__(
                        "move\t$4, %1\n\t"
                        "move\t$5, %2\n\t"
                        __MODULE_JAL(__strnlen_kernel_nocheck_asm)
                        "move\t%0, $2"
                        : "=r" (res)
                        : "r" (s), "r" (n)
                        : "$2", "$4", "$5", __UA_t0, "$31");
        }

        return res;
}

I'd not be surprised if GCC can optimize that better than the existing
code.

  Ralf





[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux